


Psychometrika 





CONTENTS 


THE RELIABILITY COEFFICIENT - - - - - - - 
T. L. KELLEY 


REGRESSION FALLACIES IN THE MATCHED GROUPS 
EXPERIMENT - - - - - - = - =- - 
ROBERT L. THORNDYKE 


UNIQUE TYPES OF ACHIEVEMENT TEST EXERCISES - 
MAX D. ENGELHART 


CONTRIBUTIONS TO THE MATHEMATICAL THEORY OF 
HUMAN RELATIONS: V. - - - - - - - 
N. RASHEVSKY 


AN APPRAISAL OF THE VALIDITY OF THE FACTOR 

LOADINGS EMPLOYED IN THE CONSTRUCTION 

OF THE PRIMARY SOCIAL ATTITUDE SCALES - 
LEONARD W. FERGUSON 


RESPONSE RELAY - - - - - - = = = = - 
J. E. P. LIBBY 


NOTE ON THE COMPUTATION OF BISERIAL r IN ITEM 
VALIDATION - - - - - = = = = = 
PHILIP H. DUBOIS 


75 


85 








VOLUME SEVEN JUNE 1942 NUMBER TWO 











ANNOUNCEMENT 


In order to reduce the computational labor of applying factor 
analysis to large batteries of tests, computational systems have been 
developed which make use of punched card equipment and other ma- 
chines which are specially designed for the purpose. Detailed descrip- 
tions of these systems are being made available on micro-film for use 
outside the University of Chicago Psychometric Laboratory. The first 
two films are available as listed below. A third manuscript in the 
process of preparation gives the details of an integrated system for 
the computation of large tables of intercorrelations. 

Micro-film copies of the following manuscripts are distributed by 
The University of Chicago Libraries: 


The centroid method of factor analysis by punched cards. 
Negative No. 1623. Approximate cost: 75 cents. 
The machines necessary to apply this system are: 
Key punch, 
Sorter, 
80 counter alphabetic tabulator, 
Summary punch, 
Multiplying punch. 


Factorial rotation of axes with graphs made by machine. 
Negative No. 1626. Approximate cost: 70 cents. 

The machines necessary to apply this system. are: 

Key punch, 

Sorter, 

Alphabetic tabulator with: 
88 type bars, 

12 X-distributors and class selectors (at least 3 class selectors), 
2 digit selectors. 


LEDYARD R. TUCKER. 














PSYCHOMETRIKA—VOL. 7, NO. 2 
JUNE, 1942 


THE RELIABILITY COEFFICIENT 


TRUMAN L. KELLEY 
HARVARD UNIVERSITY 


The reliability coefficient is unlike other measures of correlation 
in that it is a quantitative statement of an act of judgment,—usually 
the test maker’s, — that the things correlated are similar measures. 
Attempts to divorce it from this act of judgment are misdirected, 
just as would be an attempt to eliminate judgment of sameness of 
function of items when a test is originally drawn up. A “coefficient 
of cohesion,” entirely devoid of judgment, measuring the singleness 
of test function is proposed as an essential datum with reference to 
a test, but not as a substitute for the similar-form reliability co- 


efficient. 


The student of statistics and psychological measurement is aware 
that a reliability coefficient is a correlation coefficient having certain 
special properties and a certain special meaning. Mathematically the 
reliability coefficient r, of scores X is such that 0 < 7, < 1. The maxi- 
mum non-chance correlation that the X measures can have with any 
other conceivable set is \/7,, not 1. Knowing 7, for a group consist- 
ing of a narrow range of talent in which the standard deviation is o, 
and knowing that the test is equally excellent throughout a wider 
range wherein the standard deviation is }, the reliability coeffici- 
ents for these two ranges are connected by the equation o 1-7, = 
> V1—-R#,. These are three important properties possessed by the re- 
liability coefficient, but not by the correlation coefficient. Let us ex- 
amine the antecedent logic which has led to these and other important 
special properties. 

If we have a score on a single unique item, any correlation, be- 
tween the limits of —1 and 1, of it with some other measure is con- 
ceivable, but the concept of reliability does not attach to it. If we can 
conceive of a paired item that measures the same function, then the 
concept of uniqueness does not exist. Thus, unlike the correlation co- 
efficient, which is merely an observed fact, the reliability coefficient 
has embodied in it a belief or point of view of the investigator. Con- 
sider the score resulting from the item, “Prove the Pythagorean theo- 
rem.” One teacher asserts that this is a unique demand and that there 
is no other theorem in geometry that can be paired with it as a simi- 
lar measure. It cannot be paired with itself if there is any memory, 
conscious or subconscious, of the first attempt at proof at the time 


75 











76 PSYCHOMETRIKA 


the second attempt is made, for then the mental processes are clearly 
different in the two cases. The writer suggests that anyone doubting 
this general principle take, say, a contemporary-affairs test and then 
retake it a day later. He will undoubtedly note that he works much 
faster and the depth and breadth of his thinking is much less, — he 
simply is not doing the same sort of thing as before. 

The teacher who considers the proof of the Pythagorean theorem 
to be a unique activity is entitled to his view. It is a sound view for 
many purposes, but so is that, for other purposes, of the student who 
considers it as evidence of a more general ability. To this latter the 
score possesses a certain reliability and is more or less indicative of 
the general function that he is interested in. The writer has long 
noted that statisticians who approach their subject through pure 
mathematics give little or no concern to reliability coefficients. Is not 
the reason simply that they are interested in facts and relationships 
and not in the attitude that an investigator has toward a certain meas- 
ure? 

We conclude that a belief that two or more measures of a mental 
function exist is prerequisite to the concept reliability, and further, 
not only that they exist but that they are available before a measure 
of reliability is possible. We posit the question,what function of the 
two sets of measures X, and X., gotten by twice measuring the same 
individuals, and conceived of as tapping the same fundamental ability, 
is the best measure of reliability? Further, either X, and X, must be 
judged a priori to be equally trustworthy measures of this ability or 
the one be judged some number of times as excellent as the other, as, 
e.g., might a 90-item test be judged to be nine times as excellent as a 
ten-item test, the other considerations about the items being equal. 
This act of a priori judgment is inherent and, though it can be voided 
so far as combination of items is concerned by fractionizing the meas- 
ure, this only changes the size of the element upon which the judg- 
ment is made. This element can never be made smaller than the single 
test item, and it presumably should ordinarily not be made as small 
as this, for the judgment that item 1 measures the same ability as 
item 2 would seem to be less within the capacity of the human mind 
than that, say, the ability measured by a first set of 20 items, chosen 
according to certain principles and rules, is the same as that measured 
by a second set chosen by the same principles and rules. In connection 
with the following mathematical development, the X, and X. measures 
are judged to be equally excellent measures. The student can readily 
modify this treatment to cover the case where the one is judged some 
number of times as excellent as the other. 














TRUMAN L. KELLEY 17 


Let the X, measures for the N individuals be X, , X,,---, X, and 
the paired X, measures be X,, Xg,°-+, Xy. If X., Xy,+++, Xn are en- 
titled to any creditability it is because the differences shown between 
them (X,—X»), (X.—X-), +--+, (Xi—X;) --- are creditable. This seems 
to the writer the most primitive or fundamental concept of trust- 
worthiness. Let d,, = X,—X,, etc. Of the (N’—N) differences we 
cannot ask how many are believable and how many are not, because 
the issue is quantitative, but we can ask what proportion of the vari- 
ance of these differences, Vd, is trustworthy or predictable from a 
knowledge of the true differences. Let us cail a difference predicted 
from a true difference d so, if d is the true difference, we have 


and Vd/Vd would yield this fundamental proportion. We do not have 
true difference measures available, but we do have equally excellent 
difference measures D(D,z = X4 — Xz, etc.), and can actually com- 
pute 
d = Tap be D. 
op 
This will immediately yield Vd/ Vd, and by means of certain very 
plausible assumptions that further sets of X measures are conceiv- 
able, we can obtain an estimate of Vd/ Vd, as will be illustrated. 
Let us compute 74>. We first note that 
di; =X, — Xi; = (Xi — M,) — (Xj — M1) = 21 — 4,;, 
the X’s being raw scores and the 2’s deviations from the mean scores. 
Accordingly, any function of d;; and D,,; is independent of differences 
between M, and M.,. There is no requirement in the fundamental 
measure that we seek that M, = M.. 


» i Sdij;Dis 
d;;Dyy — a 
ona (N? .* N) oa, ,00,, : 





in which S is a summation of N? — N terms when i # 7, but it may be 
looked upon as a summation of N? terms if we do not impose the re- 
striction i + 7, for the inclusion of the N null terms (d;; = 0) will not 
affect the sum. For the variance of the d’s, we have 


i] 


(N? — N) Vd = Sd;;? = 3 [S 4/7] , the null terms being included , 


i] 


i=n 
= > ((a,* + 2% — See.) + (eit + at — Qaim) + ---] 


i=1 











78 PSYCHOMETRIKA 


i=n 
=} [Na;? + NV,], in which V, is the variance of the 
é=1 
x, measures, 


=2N?),. 
By very similar steps we obtain for the covariance 
(N? — N) Covariance = Sd;;D,; = 2N? o; 02 Pir, 
so that finally 


a = 112. 


ijDiy 


We thus see that the usual split-half, or similar-form, reliability 
coefficient is a precise measure of the extent to which differences in 
the X, scores are predictable by a measure of this same degree of ex- 
cellence, for X, is, according to judgment, such a measure. The issue 
of “correlation between errors” has not been involved. Whether there 
is or is not such a correlation does not alter the fact that the reliabil- 
ity coefficient, 7,. , is the correlation between d and D. 

Let us now assume that further measures of the excellence of X; 
could be constructed, given, and averaged so that the X, could be paired 
with X,,, true scores. Then we find 7z,,5,, = V712, which the writer 
has called an “index of reliability,”* and some of the properties of 
which he has elsewhere noted.+ 

We then obtain Vd/Vd = r,,, informing us that the ordinary 
(split-half or similar-form) reliability coefficient is a precise statement 
of the proportion of the variance of the observed differences in the X, 
scores that is real,—that is attributable to real differences in the abil- 
ity measured. 

We must not forget that an act of judgment (that the X, and X, 
measures are equally excellent measures of the same function) has 
been demanded. This act is of the same sort as that of the test maker 
in putting together two or more exercises into a single test, in doing 
which he asserts that item two is a measure of the same function as 
item one, etc. We may or may not trust his judgment in this respect 
and we may or may not trust his judgment in splitting a test into 
halves, but surely we have no warrant for trusting him to do the for- 
mer but not the latter. In fact, it should be a much less severe tax 
upon judgment to split a test with many items into comparable halves 


* A simplified method of using scaled data for purposes of testing, School and 
Society, July 1 and 8, 1916, 4, nos. 79-80. 

+ The reliability ‘of test scores, J. Educ. Res., May 1921. Also, Note on the 
reliability of a test, J. Educ. Psych., Apr., 1924, 15, no. 4. 














TRUMAN L. KELLEY 79 


than to draw up the items in the first instance so as to measure the 
same function. 

The split-test method has also been criticized because of the as- 
sumptions involved in the Spearman-Brown step-up formula, 7, = 
2r,/(1+7,), which are that the two halves of the test are equally re- 
liable and equally variable measures of the same thing. Small differ- 
ences in reliability and variability of the halves would seem to be 
nicely taken care of by the following formula, due to Dr. John Flana- 
gan, in which the subscripts 1 and 2 refer to the halves of the test: 


4 0; 02 Ti2 
V, = V2 2 G1 G2 ta 





r4 = reliability of entire test = 


However, the difference between this formula and the usual one 
; 
1+7, 
The split-test method of computing a half-test reliability has been 
called indeterminate because there are many other ways of splitting 
than the usual way of odds vs. evens. A determinate answer would 
result if the mean for all possible ways were gotten, but, even neglect- 
ing the labor involved, this would seem to be objectionable, for many 
of these splittings would be such as to contravene the judgment of 
comparability. In splitting we should not seek a mathematical out- 
come, but a judgment outcome, and for the same logical reasons as 
warrant a judgment product in putting together the items of the test 
in the first instance. The rule for splitting can well be the same as for 
drawing up comparable forms,—so do it that the range and nature of 
the functions tapped are as nearly the same in the two instances as 
judgment permits. In this rule the plural “functions” occurs, for it is 
not assumed that any test maker can write, or believe himself capable 
of writing, items that measure one function only, though of course his 
endeavor should be to do so. The writer judges that the more precise 
Kuder-Richardson procedures, later discussed, do well cover the case 
where a single function is measured by items, but this situation seems 
to him to be remote from practical situations. These observations 
argue for the putting of judgment into the splitting into halves or 
building comparable forms involved in the computation of reliability, 
not that procedures be so mechanized that judgment is taken out. 
The writer believes it altogether desirable that the term “reliabil- 
ity coefficient” be restricted to the correlation between similar meas- 
ures. This not only is the meaning originally given the term by its 
deviser, C. S. Spearman, but this is the necessary meaning in order to 
be a precise measure of the reality of the differences shown by the 





, = is trifling for usual conditions. 











80 PSYCHOMETRIKA 


measures. Let us compare this measure with the retest correlation, 
and the Kuder-Richardson measures. 

In the case of the retest, if at the time of the second test there is 
any memory, conscious or subconscious, of the earlier responses, then 
certainly the mental operations being performed at the second taking 
are not the same or even similar in kind to those performed at the 
first taking. Surely if the time interval between takings is short 
enough, we can expect the differences between the scores of subjects 
upon the first test to be exactly predicted by the differences between 
the retake test scores. The numerical value of the retest coefficient of 
correlation will decrease as the time between testings is increased. It 
thus is a function of this time, but whatever this time interval and 
whatever the value of the retest correlation, there seems no logical 
reason for taking it as a measure of the fundamentally important ratio 
Vd/Vd. 

Kuder and Richardson* give a number of formulas, from complex 
to simple, for the computation of the reliability coefficient, all conse- 
quent to a certain “operational definition of equivalence.” They ob- 
serve that their definition is “more rigid than the one usually stated.” 
It is certainly more restrictive than the one here used, and the writer 
judges more restrictive than need be. To judge of Vd/Vd there seems 
no necessity that items in the paired forms be matched for difficulty, 
that the aggregate difficulties be matched, or even that the items 
separately be matched for excellence but only that the aggregates be 
so matched. In their more precise formulas an 7;; , the item reliabil- 
ity, enters, but this is not an observed datum but definitive and de- 
terminable only with the aid of certain assumptions, in particular the 
questionable one that “the matrix of inter-item correlations has a 
rank of one.” 

Their simplest formula [21] is 


) , 





n (of—NDQ 
orn tae | a7” 
in which o; is the standard deviation of the total test scores, n the 
number of test items, p the mean proportion of right responses upon 
the n items, i.e., » = M/n, where M is the mean total test score, and 
q =1-— p. Of course, adding a number of easy items which every- 
body answers correctly will change neither the standard deviation nor 
the reliability of the test, but inspection shows that it does change the 


_*G. F. Kuder and M. W. Richardson, The theory of the estimation of test 
reliability, Psychometrika, 1937, 2, 151-160. 














TRUMAN L. KELLEY 81 


7:4 as given by this formula. This is easy to show algebraically, but a 
numerical illustration will suffice. Let us first have a 50-item test, 
mean 25, and o; = 5, then r;; = .51. Let us now add fifty easy items 
which everybody answers correctly, the mean is now 75, o, = 5, and 
Y1, = .25. Surely this simplest formula is utterly suspect in spite of 
the empirical agreement which the authors and others have reported 
between the values given by it and comparable-form reliability co- 
efficients. There may be conditions under which formula [21] could 
be trusted,—the empirical findings suggest that this is so,—but as the 
authors only offer it as a “foot-rule” formula, one cannot expect an 
experimental establishment of these conditions in the situations in 
which it is likely to be used. 

In connection with an analytical investigation of the functions 
measured by the more precise Kuder-Richardson formulas, we should 
note the major premise from which they spring. In 1939 Kuder and 
Richardson express agreement with C. Spearman in stating* that “the 
reliability coefficient is defined as the coefficient of correlation between 
one experimental form of a test and a hypothetically equivalent form.” 
However, their derivations seem clearly to be based upon another 
proposition, which in 1937 they state thus: 

“It is implicit in all formulations of the reliability problem that 
reliability is the characteristic of a test possessed by virtue of the 
positive intercorrelations of the items composing it.” That this is 
non-equivalent to Spearman’s definition can be demonstrated in con- 
nection with the data of Table I, giving inter-item covariances of the 
items composing the two forms of a test. 


TABLE I 
Variances and Covariances of Test Items 








Form 1: Items Form 2: Items p = proportion of Standard de- 
a b c A B C right responses viation of items 
a 25 .00 .00 1875 .00 .00 5 5 
b 25 .00 .00 1875.00 5 5 
c 25 .00 .00 1875 5 5 
A 25 .00 .00 5 5 
B 25 .00 5 5 
C 25 5 5 





Score X, =a+ b+ cand similar-form score X,—A+B+C. Ac- 
cording to the Kuder-Richardson proposition, X, has no reliability, 
for positive correlation between the items is lacking. However, accord- 
ing to Spearman’s definition 7,. = .75 and Vd/Vd = .75, indicating 


_ *The calculation of test reliability coefficients based on the method of ra- 
tional equivalence, J. educ. Psych. Dec., 1939, 30. 











82 PSYCHOMETRIKA 


that three-fourths of the variance of X, scores is real, or predictable 
from true measures of the function in question. Let us collect various 
measures for the data of Table I. 


Similar-form reliability coefficient =.75 
Vd/Vd = 16 


“Coefficient of coherence”, mentioned in 
the next paragraph, 


VC/SVX; 33 


Kuder-Richardson formula [8] reliability = .58 
(This formula given by Kuder-Richard- 
son as their most reliable.) 


Kuder-Richardson formula [14] reliability = .00 
Kuder-Richardson formula [20] reliability = .00 
Kuder-Richardson formula [21] reliability = .00 


Of course X, is not a promising measure, but its shortcoming is not 
laek of reliability, but lack of unity, and can be traced to faulty judg- 
ment of the test maker. 

Though we question the Kuder-Richardson proposition as a for- 
mulation of reliability, we should consider the idea in it very impor- 
tant in connection with the concept of unity or coherence of a test. 
Let the items of a test bea, b,c, d--- and the test score 


X,=—W,a + wb + wee + wad +e. 


Let all the covariances between items be computed and a matrix 
formed and factorized by the Kelley* method, which preserves the 
initial metric given by the variables with their attached weights. If 
the first component of this matrix is C and the sum of the variances 
of the weighted items SVX;,—this being a precise measure of the total 
variance inherent in all the items,—then VC/SVX; is a measure of the 
unity or coherence of the test. This would seem to be a very impor- 
tant measure and one to date altogether lacking. The writer suggests 
the name “coefficient of coherence” for the ratio VC/SVX;. It is a 
measure of the morale* or singleness of purpose, of the items consti- 


* Essential traits of mental life, 1935, and Talents and tasks, Harvard Edu- 
cation Papers No. 1, 1940. 

*T. L. Kelley, When Cease Firing Sounds, Christian Science Monitor, Nov. 
8, 1941, defined morale as “the individual attitude in a group endeavor,” follow- 
ing which the morale of a test item is the congruence of its intent (what it meas- 
ures) with that of the group of items constituting the test. 














TRUMAN L. KELLEY 83 


tuting the test. Kuder and Richardson assume complete unity of pur- 
pose when they assume a rank of 1 for their correlation matrix of 
test items. It would seem far better not to make any assumption but 
to measure the proximity to a rank of 1 by computing VC/SVX;. 

The computation of VC for a hundred-item test would involve no 
less than 100(100-1) /2 inter-item correlations, or covariances, and 
thus might well be impractical. However, if such a test were divided 
into, say, ten parts of ten items each,—the items within each part be- 
ing judged to be as homogeneous as possible (equivalent to the judg- 
ment that the parts are as heterogeneous as possible, which is the op- 
posite of the judgment made when splitting for reliability purposes), 
—only 45 covariances are now required and the determination of the 
variance of the first component of these ten parts is entirely feasible 
and this VC should be a serviceable approximation to that given by 
the 100-item analysis. Illustrative examples of the closeness of such 
approximation are, of course, needed. 

Other approaches to a quick determination of VC may lie in some 
utilization of 7;; measures, the correlation between the items and the 


total. 











PSYCHOMETRIKA—VOL. 7, NO. 2 
JUNE, 1942 


REGRESSION FALLACIES IN THE MATCHED 
GROUPS EXPERIMENT 


ROBERT L. THORNDIKE 
TEACHERS COLLEGE, COLUMBIA UNIVERSITY 


This paper is concerned particularly with certain regression 
effects which appear whenever matched groups are drawn from popu- 
lations which differ with regard to the characteristics being studied. 
It is shown that regression will produce systematic differences be- 
tween these groups on measures other than those upon which they 
were specifically matched. The size and direction of these differences 
depends upon the differences between the parent populations both in 
the matching and in the experimental variables and upon the corre- 
lation between the matching and experimental variables. Formulas 
are presented for estimating the expected regression effect. Several 
alternative procedures are suggested for avoiding the erroneous con- 
clusions which the regression effect is likely to suggest. 


It is not the purpose of this paper to present any scintillating 
new statistical ideas. What is said here should be in the nature of a 
reminder of old truths, rather than a message of startling novelty. 
The aim is to restate and clarify for research workers in education 
and psychology some of the errors into which they may lapse when 
they use the experimental pattern of matched groups. The matched 
groups experiment is sufficiently prevalent in education and psychol- 
ogy and the use of it sufficiently uncritical, to make it worth our while 
to enquire into certain sources of error. 

The fallacies with which we are here concerned may arise when- 
ever the measure or measures by means of which the groups were 
matched have less than a perfect correlation with the measure of the 
experimental variable which is being studied. A more limited example 
of this is found in the less than perfect correlation between a test and 
a subsequent retest with the same instrument. However, our argu- 
ment is more general than this, and holds whenever groups are 
matched upon one measure or group of measures and then studied 
with regard to their performance on other measures which do not 
have a perfect correlation with the matching variable. Since this is 
universally true in the matched-groups experiment, the points to be 
raised here are of quite general application. 

Whenever the correlation between two measures is less than 
unity, part of the variance of scores in each measure is independent 
of variance in the other measure. We can conceive of performance in 


85 











86 PSYCHOMETRIKA 


each test as made up of two parts—a part common to the two tests 
and a part unique to the particular test. Those individuals who re- 
ceive high scores on test X do so in part because they possess large 
amounts of whatever is common to X and Y, in part because they 
possess large amounts of whatever enters into X score but not into 
Y score. (We have no concern, for the present, as to what this spe- 
cific element is, i.e., whether it is “specific factor” or “error of meas- 
urement.”) Since the specific element in test X and the specific ele- 
ment in test Y are unrelated, those individuals who possess large 
amounts of the X specific will, as a group, possess just average 
amounts of the Y specific. The total group of those found to deviate 
from the average in test X in a certain amount and direction will also 
deviate from the mean in test Y in the same direction. They will not, 
however, deviate in the same amount. Whereas in test X both the 
specific and the common factor combined to produce the deviations in 
those who were selected because of their deviant X score, in test Y 
only the common factor is at work. The result is that those selected 
as falling H standard deviations above the X mean will fall Hr,, stand- 
ard deviations above the Y mean, and vice versa. The Y scores will 
regress towards the mean by an amount which is a direct function of 
the size of the X deviation and an inverse function of the correlation 
between X and Y. 

It is important to point out that the regression of scores upon a 
second test is toward the mean of the population from which the 
cases were selected, and which they truly represent. If a ninth- and 
a twelfth-grade population are tested with an intelligence test and a 
reading test, it will be found that the twelfth graders above the twelfth 
grade mean in intelligence will tend to drop down toward the twelfth 
grade mean in reading score (when scores are expressed in standard 
deviation units). However, twelfth-graders who are above the total 
group mean but below the twelfth-grade mean will tend to regress 
up toward the twelfth grade mean—not down toward the combined 
population mean. Similarly, the ninth-graders will regress toward 
the ninth-grade mean. By the same token, if we study two groups of 
10-year-olds, one group drawn from a private school catering to exec- 
utive and professional families and the other from an orphanage or 
an institution for retarded children, we must expect the deviant in- 
dividuals in each group to regress toward the mean of population 
from which they come, and not toward some hypothetical average of 
10-year-olds in general. 

This point can be illustrated by an empirical comparison of data 
from a group of 8- and 9-year-olds and a group of 12- and 13-year- 




















ROBERT L. THORNDIKE 87 


olds. For each child in these two groups there were available an M.A. 
on Form L of the Revised Stanford Binet and a score on a brief 15- 
word vocabulary test (selected from this same material). Data on 
the two groups for the two tests were as follows: 


8&L9 12 €13 
N 185 138 
Mean M.A. 9.42 yrs. 12.65 yrs. 
Mean Vocabulary Score 3.29 words 5.54 words 
Intercorrelation 77 82 


From each group were picked out all the cases with M.A. falling 
between 10-0 and 11-11. This is a group above the average M.A. of 
the younger group, and below the average M.A. of the older group. 
The average vocabulary scores for the 8 and 9 and for the 12 and 13 
groups are 4.09 and 4.41 respectively. If we reverse the procedure, 
and determine the average M.A. in each group for those having vo- 
cabulary scores of 4 and 5, we get 10.59 and 11.46 respectively. In 
each case, we see that the groups selected as matched on one variable 
are not matched on the other. The older children surpass the younger 
in each case. This is because the older group regresses toward a 
higher population mean score than the younger group. 

In studies using matched groups, we can recognize three patterns. 

In the first pattern, two or more matched groups are assembled 
within a single population or in different sub-populations which may 
reasonably be thought to be, in all essential features, fractions of the 
same total population. For example, if our interest is to compare the 
effectiveness of three different types of materials in developing rapid 
reading, we may select from the students in a large class three groups 
that are equated in terms of initial speed of reading, and then try out 
the three variations in practice material, one on each of the three 
groups. Or if one teacher instructs three different class groups, all 
chosen in the same way from the same student body, we may make up 
one of our matched groups in each of the classes. The crucial point 
in this example is that the groups are selected from what are, to the 
best of our knowledge, equivalent populations. 

In this type of situation, regression should affect each of our 
samples in the same way. Since the samples are all taken from the 
same population, we may expect them all to regress toward the same 
population mean, and since they have the same distribution of scores 
on the matching variables, the expected direction and amount of re- 
gression is the same. There is no reason why there should be any 
systematic tendency for the regression from matching to experimental 











88 PSYCHOMETRIKA 


variable to affect one group differently from the other. Of course, 
chance fluctuations may be expected to disturb in some degree the 
exactness of the matching, but this effect should be a random one. 

The problem then reduces to determining the appropriate stand- 
ard error for evaluating the obtained difference upon the retest. The 
appropriate formula must take account of the reduction in chance dif- 
ferences between the means on the experimental variable due to the 
matching. This problem has recently been reviewed in some detail 
by McNemar.* If the matched groups have been assembled by match- 
ing pairs of individuals, the appropriate formula for the standard er- 
ror of a difference upon any measure other than that upon which they 
were matched becomes 





op = Vo;* + o,° —Z2%x 0505, (1) 


where 7,. is the correlation between members of a pair in the new 
measure and o~ and c- are the standard errors of the two group means 
for the new measure. If the matching of the two groups is in terms 
of distribution of scores for the groups as a whole rather than indi- 
vidual pairing, an acceptable formula for the standard error of the 
difference is 





op — V (o,? a o,*) a 7-4)» (2) 


in which r,, is the correlation between the test upon which groups 
were matched (y) and the experimental test (x). 

The second pattern is that in which we are faced with two or 
more discrete categories of individuals. The categories are differen- 
tiated by some characteristic external to but perhaps related to either 
the measure in terms of which the groups are being matched or the 
experimental variable or both. We might, for example, study a group 
of men and a group of women who were matched in performance 
upon a test of strength of grip. We might plan to work with matched 
groups of students who do and students who do not take Latin, the 
basis of matching being score on a test of English vocabulary. An- 
other possibility might be to work with a group of orphanage children 
and a group of private school children, matched for intelligence test 
score. In each of these cases we are dealing with matched groups se- 
lected from two distinct populations—populations which probably 
have quite different means upon the tests in terms of which the 
matched samples were chosen. 

In order to get a matched group when the two populations have 


* McNemar, Quinn. Sampling in psychological research. Psychol. Bull., 1940, 
37, 331-365. 








6. £2 eee 


FIGuRE 1 _ 





89 


ROBERT L. THORNDIKE 





$'0Z = ULe—,,uaUIOM,, PUB ,,Ua,,—dnorH payozeW 














uh oft et qz ft ac ot a ae < or 8 
cs T ape 
1. t 
iz 
Ki 
e 
’ 
' 
( 
? 
AL 
Q yt — ueap—uoreindog [210 ,—,,usu0M,, 
te ot Sz ae he ca) or LU a br st ov 2 
oe 











rn 




















8 °° ae 5 7) or 3 














suorze[ndog a4ezr0SIq. Ul UOIsserZaiI 
T ayndIy 


























PSYCHOMETRIKA 


90 





$61 = urap_w—,,usui0oM,, Jo dnoiy payoep{—sei0g 34s9}0y 


ze oe at C4 ht zt ot 8 n hi t o 8 











O'1Z —= uva~|—,,Ue,, JO dnory payoyep[—so100g j4sa}0z 


xe oc gt at pe te ot 3: ” A vw or & 

















suoize[ndog 23e10Siq Ul UOISseIdexy 


T Gandy 


ee 














oa 
E 
H 
H 





ROSERT L. THORNDIKE 91 


different mean values, we must take individuals who fall relatively 
high in one population and match them with individuals who fall rela- 
tively low in the other. Since the individuals in each group will re- 
gress toward their own population mean, the regression in the two 
groups will be different. Upon another test, our groups will no longer 
be matched. 

We can illustrate this with some artificial data constructed from 
dice throws. Let us suppose that these data represent score upon 
two strength tests. Suppose that we use seven dice, numbers 1, 2, 
and 3 representing common ability in the two tests, numbers 4 and 5 
represent the factor specific to the first test, and numbers 6 and 7 
representing the factor specific to the second test. Score for the first 
test is the number of spots showing on dice 1, 2, 3, 4 and 5; score on 
the second test is the number of spots showing on dice 1, 2, 3, 6 and 7. 
In this way two scores were obtained for 132 “women.” Scores for a 
second population of 1382 “men” were gotten in the same way except 
that the constant amount 5 was added to the number of spots showing 
to give each score. The distributions of scores for men and for women 
on the first test are shown as the first two histograms in Figure 1. 
The theoretical difference between the means of these two populations 
is 5; empirically, it comes out to be 4.7. 

Now a sub-group is made up in each population by selecting 
cases which can be individually matched with cases in the other popu- 
lation on the basis of score on the first test. This gives 64 matched 
pairs. The mean is, in each case, 20.3. Let us examine the second test 
scores of the 64 “men” and 64 “women” in these matched groups. 
From the last two histograms of Figure 1, we see that the “men” 
have regressed up so that their mean second test score is 21.0; the 
“women” have regressed down to a mean score of 19.2. On the second 
test the two matched groups differ in mean score by 1.8, about 40% of 
the difference between the means of their parent populations. 

If we know the means and standard deviations of the two popu- 
lations from which our matched samples are drawn, both upon the 
matching test and upon the experimental test, and the correlations 
between the two tests in each population, we can determine the amount 
of difference to be expected on the second test between the means of 
samples from each group matched on the first test. The formula is 


= = an S S, 
m4 Fi sY = (X a 1M) (ity “- a =5 
A Bz 


S: 
S (3) 
— Bray =7D + (CM, Pa 3M,) ’ 
BYz 


and the derivation of this is shown in the mathematical note below. 








92 PSYCHOMETRIKA 


MATHEMATICAL NOTE 


Given: Two populations, A and B, having different population means in a 
measure X. The population means are designated ,M, and ,M,, respectively. 
These two means differ by the amount D. The standard deviations in the two 
populations are ,S, and ,S,. 

A sample has been selected from each of the populations in such a way that 
the mean X score in each sample is the same. The mean score in these samples is 
designated X. 

Required: To determine the expected difference between the means of these two 
samples upon some other measure Y , when the population means and standard 
deviations for Y are ,M,, ,M,, 4S,, ,S,, respectively, and the coefficients of 
correlation between X and Y in the two populations are ,r,,, and ,7,,- 
Derivation: The X score of an individual from population A may be designated 
4X;+ For this individual, the predicted score on test Y is 





. S, 
AX i =a'ey 5 (5, = gy) + ge 
z 


If we sum over the N, cases in the matched sample from population A , we get, 
as an unbiased estimate of the mean of the Y scores, 











Na Na 
m int AY; Sy i=1 aX . LOM 
| Gam = — 44 
A Y Awa z d y 
Na aS; N4 
S, 
aii A — a” 
= sh ey 7 (X — ,M,) + 4M,- 
A™2 


Similarly, for the matched sample from the population B, 


= By bs 
BY = sly (X a BM,) + pM, 
BY 2 
S 
BYy = - 
= gp — (X — ,M, + D) + 9M,- 
Br 


If, now, we subtract, we get 
Y g S 
2 - - AY y BYy BYy 
¥—,¥ =(X— mM ( Vey — BY n =r) — B'y — P 
A B A’ £274 A’ ay aa. ry ee J, 
+ (4M, — ,M,). 


When 


ir i 


7 ay "ey? 


A "ry a 


192 —= Be — aSy — By ’ 


M,—,M,=D, 


A 


the foregoing expression reduces to the very simple expression 


te 


alii 
om 





Silly 
agp 





























ae SES 


= w 





ROBERT L. THORNDIKE 93 


aY —,Y=(1-7,,)D. 


It is possible, following out just the same line of analysis, to determine the 
difference to be expected in a third variable Z when groups have been set up 
matched in terms of two variables, X and Y, and the procedure can be general- 
ized to any number of variables. The resulting formula grows out of the regres- 
sion equation for predicting Z from X and Y, and involves the partial regression 
coefficients. The formula becomes 


ZZ ine gen (4X ie 4M,) (a2 eny bi Bozpy) 7 Borpy D, 
+ (4Y ai 4M,) (ao zy-2 2 Bozy.2) me Boxy.cDy =U (4M, or pM.) 


If 4622.4 = pozyy and 46... = gb.,.., this reduces to 


aZ a? 32 >= (,M, oF pM,) cm (627), + b2y.¢D,)- 

If we are dealing with a test and retest with the same instrument, 
and if we can assume that (a) the standard deviation for both test 
and retest, (b) the test-retest correlation, and (c) any gain in mean 
score from test to retest, are the same for both populations, then the 
difference to be expected between the means of the two matched 
groups on the retest reduces to the very simple expression 


4Y -;Y= (i++) , (4) 


where 7, is the test-retest correlation and D is the difference in score 
between the means of the two populations from which our matched 
groups were drawn. 

Equation (4) presents the simplest possible picture of the effect 
of regression, uncomplicated by any differences in variability, relation 
of first to second test, or proneness to gain in the two populations. 
This simplified picture will be only an approximation in most actual 
cases, and it will ordinarily be difficult to tell just how reasonable the 
assumptions involved in this formula are for our data. 

It is perfectly possible to develop formulas of the type given in 
equation (3) for the expected difference when the matching is based 
upon two or more variables. The formula in the case of two match- 
ing variables is given in the mathematical note. The formulas are 
straightforward but unwieldy. In practice, the chief difficulty which 
would be encountered would be that some of the statistics with regard 
to the populations from which the matched samples were selected 
would be unknown. Excepting as it is possible to compute or estimate 
these, it is, of course, impossible to solve the equation which provides 
an indication of the expected difference upon the experimental test. 

Let me illustrate this regression effect with an actual research 








94 PSYCHOMETRIKA 


reported in the psychological literature. I select this example with- 
out malice—I might have selected any of a number of others—because 
it is known to me and because it illustrates my point so perfectly. 
Crissey* has reported an investigation of mental development in or- 
phanages and institutions for the feebleminded. Among other things, 
he selected from the test records a group in the orphanage and a 
group in the institution matched in initial I.Q. The average I.Q. of 
the orphanage population was 85, of the institution population 65. 
Now a single Binet I.Q. is not infallible, even as an indicator of per- 
formance on that test the next day, and is a good deal less so as a pre- 
diction of performance a year or two later. We should expect the high 
scores in each population to drop toward the population mean and the 
low scores in each population to rise. Assuming a test-retest correla- 
tion of .80 for the unspecified interval of time between tests in this 
study and assuming the conditions mentioned on page 93, we should 
expect an I.Q. difference between these two matched sub-samples on 
the retest of about 4 points of I.Q. Of course, the orphanage children 
should score the higher. The obtained difference was 6 points. The 
bulk of the obtained difference needs no other explanation, therefore, 
than the fact that scores on a fallible test tend to regress toward the 
mean of the particular population to which they belong. 

The third pattern arises when we deal with groups which are dif- 
ferentiated with respect to amount of one continuous variable and 
matched with respect to another correlated variable. This differs from 
type No. 2, which we have just considered, only in that the differen- 
tiating factor in the populations from which we select our matched 
groups is amount of some quantitative trait rather than membership 
in one or another discrete category. We might, for example, give a 
large group of pupils an intelligence test and an arithmetic test. Di- 
recting our attention to those cases which fell in the top fourth and 
those which fell in the bottom fourth in intelligence score, we could— 
with some difficulty—so select a smaller sample from each of these 
fourths that the samples were matched in arithmetic test score. If 
these two groups were then given practice in memorizing nonsense 
syllables, and were subsequently retested upon another form of the 
arithmetic test, we might be led to attribute the substantial difference 
between the two groups on this retest (which we would undoubtedly 
find) to the differential effect of memorizing nonsense syllables upon 
the arithmetical ability of bright and dull children. 


_ *Crissey, O. L. Mental development as related to institutional and educa- 
“4 residence. University of Iowa Studies in Child Welfare, 1937, 18, No. 1, 
p. 81. 





























Y 


ROBERT L. THORNDIKE 95 


FIGURE 2 
Regression in Continuous Variables 


Low score group x High score group 
sy —— 
So 
4G 
42 





ae [ e ite 


a4 . «% 





3o 





26 
ar 
7 - bs 


2% go Sf 3¢ Yun HH SO SH SB Gr 
Total Population—Initial Tests 











4 


























Low score group x High score group 
Ss4 7 I] 
% | 
« a | 
44 al _ 
BY aay ‘ Y = 35.4 
38 x A “| we 
4 ee ee 
90 | : = 
Bocconi 
Bia 
a2 | 
ig te 
Be wo HW Be HH FO SY s8 62 
X= 36.4 X= 48.4 


Matched Groups—Initial Tests 


. = high group score 
x = low group score 











96 PSYCHOMETRIKA 


FIGURE 2 


Regression in Continuous Variables 




















x 
54 i 
30 
% e 
92 ke xe ai e 
Pal e ° 
” . o4 ee: (ba 
_—_ / " @ » “—.° ex +o @ , 
¥ = 331 { a a » Kole 
he we A OP ah 4K 

$e “n "* 

. »,* x 
. ( 

I 
22 | | 
18 | 

2% 30 W 38 42 He _So SY SE Se 
X,= 39.3 X,= 45.8 
Matched Groups Retest 
x == low group score 


. = high group score 


Actually, however, the difference in the groups on the third test 
(in this instance a repetition of the arithmetic test) is a regression 
phenomenon. The essential feature of our previous situation is re- 
tained, in that the two populations (children high in intelligence and 
children low in intelligence) have different means upon the variable 
which has served as the basis for matching. (arithmetic achievement). 
Upon a retest, each sample mean will move toward its population 
mean, and the samples will no longer be matched. 

Some artificial data to illustrate this point have been prepared, 
using dice throws as before, and are shown in Figure 2. Scores on a 
test and retest for two variables, X and Y , were composed, using cer- 
tain dice for factors general to X and Y , certain ones for factors spe- 
cific to each variable, and certain ones for errors of measurement on 
each test. Setting up two populations differentiated in X score, cases 
were matched upon the basis of Y score. Then the retest scores in 
both X and Y for the matched groups were examined. The difference 
in mean X score was 12.0 on the test, and dropped to 6.5 on the retest. 
The two groups had the same mean Y score on the initial test, but on 
the retest they differed by 3.3 points. They drew together upon the 


























Ea faite 








ROBERT L. THORNDIKE 97 


test upon which they had been differentiated, and drew apart on the 
test upon which they had been matched. 

Here, again, it is possible to express in a formula the amount of 
divergence to be expected between the means of the two groups upon 
a third test (which may or may not be a repetition of the matching 
test) when they are matched on one variable Y and differentiated on 
another variable X . Calling the third variable Z , we get the follow- 
ing 

Voz — Vay" yz S: 


Sy Se 5 Ga Bey. (5) 


If 7°», = 1, , this can be expressed 


Es = z oa Vighh — Tye) x. ‘a xX 

See er i: : 

The derivation of this formula is outlined in the matmematical note. 

As we examine this formula, we see that the expected amount of sepa- 
ration, in standard deviation units, is 


(6) 





(1) a direct function of the difference between the means 
of the two differentiated sub-populations; 

(2) a direct function of the correlation between the dif- 
ferentiating and the matching variables; 

(3) an inverse function of the correlation between the 
matching and the experimental variable. 


MATHEMATICAL NOTE 


Given: A population of individuals measured on variables X and Y. 

Within the total population two sub-populations are set off, having different 
distributions and mean values for X. (Usually one sub-population will consist 
of those scoring high in X and the other of those scoring low.) These sub-popula- 
tions are designated X,, and X, , respectively. 

Two samples, matched with regard to average score on test Y, are estab- 
lished—one in each of the two sub-populations indicated above. 

Required: To determine the expected difference between the means of the two 
samples upon variable Z. 
Derivation: For any pair of values of X and Y , the predicted value of Z is 


Z = bey pY + b..4X +C. 


Summing over the n,, cases in sample H , we get 











NH _ nH nH 

= =Z Po = 

Zu — = Bags i i o, y +C 
Ny ny Ny 








98 PSYCHOMETRIKA 


= Daye Yn t+ bey Xn tC. 
Similarly, for the n, cases in group L, we get 


2, = bys ¥,+ beay X,+C. 


Since the groups were matched on the test Y , we know that Y, = Y, . Therefore 


Zy a EB, == bn, (X, — X,) 
CTs By = ™ 
= ES (Xy — X1)- (5) 
i-%, 4, 
If r,, = 1,,, this can be expressed 


Zu —Z,  Ty(1—%y) Xy— Xz 
- =a | =|=6-: 





(6) 


zy 


Again, the formula can be generalized to any number of matching or differ- 
entiating variables. If two groups have been set up which are differentiated on 
variable X , but matched on Y and W, the expected difference on variable Z is 


Zu — Z1, — a) wa iO 


and the extension of this to any number of matching variables is quite clear. If 
the groups have been differentiated on X and Y , but matched on W, the expected 
difference on Z is 


Zy — Z, = bee yo(X 5 2 X,) + Vain? x ry 


and again the extension to additional variables is straightforward. The practical 
problems which will arise will concern the feasibility of determining the desired 
statistics. 


Real situations do arise involving just the type of regression 
which is discussed above. Two have been encountered recently in pro- 
posed doctoral dissertations. In one case, a general intelligence test 
and an analogies test had been administered to a population of stu- 
dents, and two groups were selected which were differentiated in in- 
telligence but matched in analogies score. After some intervening 
training, another analogies test was given. It was found that the 
matched group with the higher intelligence did reliably better on the 
second analogies test. Since the correlations between the intelligence 
test and analogies tests were quite high, and since the correlation be- 
tween the two analogies tests was far from perfect, this result could 
have been predicted, entirely without regard to the intervening ex- 
periences. 








—_— —_— a as 























ROBERT L. THORNDIKE 99 


In the second case, personality characteristics were to be studied 
in two groups which were matched in intelligence, but sharply dif- 
ferentiated in school achievement. Such groups could be built up, 
using intelligence test score on the one hand and achievement test 
score, school grades, and teachers estimates on the other. But it can 
safely be predicted that upon a subsequent retesting they would be 
found to be neither accurately matched in intelligence nor so sharply 
differentiated in school achievement. 

Having examined the regression effects which appear when we 
are dealing with groups from dissimilar populations, we are now led 
to ask: What are we going to do about them? What adaptations should 
we make in our experimental design or statistical treatment? 

The most usual answer has been: Ignore them. This may, in 
some cases, be a reasonable expedient. When the populations do not 
differ greatly in the distribution of the measures being studied, or 
when the correlation between the matching and experimental vari- 
ables is very high, the systematic errors introduced by regression will 
be small, and may very probably be insignificant in comparison with 
the effect of the other factors which are being studied. But in other 
cases, when the two populations differ more sharply and the intercor- 
relations are lower, the systematic regression errors may be of such 
size as to lead to entirely erroneous conclusions. This possibility must 
always be kept in view. 

A second possibility is to insist that all investigations be carried 
out and all comparisons based upon groups selected from within the 
same population. This is certainly an ideal to be striven for. It is 
desirable not only because it eliminates those regression fallacies in 
matched group procedures with which we have been concerned, but 
also because it makes usable other efficient and powerful techniques 
of treatment. The analysis of covariance technique,* the techniques 
developed by Johnson and Neyman,* and a procedure suggested by 
Peters* all make it possible to use every case in each group studied as 
a basis for determining the effect of experimental treatments. These 
procedures all involve correcting scores on the experimental variable 
in terms of differences in background traits, on the assumption that 
the same regression equation of experimental upon background traits 


* For a discussion of analysis of covariance, see Snedecor, G. W. Statistical 
Methods. Ames, Iowa: Collegiate Press, 1938. 

For Lindquist, E. F. Statistical Analysis in Educational Research. New 
York: Houghton Mifflin, 1940. 

* Johnson, P. O. and Neyman, J. Tests of Certain Linear Hypotheses and 
pio "ama to some Educational Problems. Statistical Research Memoirs, 

* Peters, C. C. A Method of Matching Groups for Experiment With No Loss 
of Population. J. educ. Research, 1941, 34, 606-612. 











100 PSYCHOMETRIKA 


is appropriate for each group. That is, the assumption must be made 
that each group is a sample from the same population. These other 
procedures will probably be generally preferred to the procedures in- 
volving matching, since matching may make for either administra- 
tive difficulty or the loss of cases with consequent lowering of experi- 
mental efficiency. 

It must be emphasized that all the methods just mentioned as- 
sume, either explicitly or implicitly, that the same regression equation 
between background traits and experimental variable is appropriate 
for both, or in the case of more than two, all groups. When this is not 
the case, those same regression fallacies which we have discussed in 
the case of matched groups are once more encountered and group dif- 
ferences arise simply because of differential regression effects. Analy- 
sis of covariance, the Neyman-Johnson methods, and the procedure 
suggested by Peters do not make any allowance for differences in the 
regression equation, arising most commonly out of differences in the 
means of the populations from which the experimental groups were 
taken. 

Although it would be well, insofar as possible, to avoid investiga- 
tions involving groups from two populations differing appreciably in 
the characteristics under study, there may be some cases when data 
of this sort are the only kind available and must be used. The school 
achievement of delinquents can be assessed only by comparing it with 
that of nondelinquents of comparable ability, even though these 
groups come from quite diverse parent populations. The gains from 
taking Latin can be studied only by comparing a group of Latin-study- 
ing pupils with a group of equivalent non-Latin-studying ones, even 
though the two total pupil populations may be significantly different 
in certain academic traits. In cases of this sort, some procedure to 
take account of differential regression effect is urgently needed. 

We can recognize, in the last paragraph, two types of situations 
calling for samples from different parent populations. In the first 
type, exemplified by the delinquent- nondelinquent comparison, we are 
concerned with the effects of mere membership in a particular group 
or category together with whatever that membership may involve or 
imply. We wish to determine whether groups which are truly equiva- 
lent with regard to some background trait are different with regard 
to the experimental variable. If the groups are to be equivalent in 
true score on the background trait, they must be matched on the basis 
of predicted true score—i.e., score predicted by the regression equa- 
tion between original test on the background trait and a retest at the 
time of the experimental comparison. Since the regression equations 











de 
or 
1- 
l- 
i 

















ROBERT L. THORNDIKE 101 


for the different populations will not be the same in the case which 
we are now considering, the predicted true scores for each individual 
must be determined from the regression equation for his own popula- 
tion. Groups matched in this way will be truly equivalent upon the 
background trait, and differences between them in the experimental 
trait must be due to some factor other than background trait differ- 
ences. 

In the second type of situation, exemplified by the Latin-non- 
Latin comparison, we are interested in studying the effects of a cer- 
tain type of experimental treatment, but the exigencies of life are 
such that that experimental treatment is and can be applied only to a 
population which is selected and atypical of the generality of cases. 
In this case, our concern is to get groups which would, except for the 
effect of the experimental treatment, be equivalent in the final test 
of the ability or trait being measured. We should match members 
from our two groups in terms of predicted final test score. Again the 
regression equation for predicting final test score will be different for 
each population and each final test score must be predicted in terms of 
the regression equation for that population. The regression equation 
for each population must be the one which holds when the special ex- 
perimental treatment is not applied, or else the effect of the experi- 
mental treatment will be absorbed into our regression equation. If 
regressed values are used as indicated above and the groups are 
matched, an observed difference in actual final test scores will be 
attributable to the effect of the experimental treatment. 

These procedures are straightforward, but involve quite a burden 
of computation. The chief difficulty to be encountered will be in de- 
termining the regression equations for the different parent popula- 
tions. An expedient which may often be useful here is to assume that 
the variability and correlation are the same in the several parent 
populations. These could then be approximated by summing up the 
variances and covariances from the sample of each population which 
we have tested and computing the variance and correlation from these 
summed values. In that case, only the values for the means would be 
different in the regression equations for the several populations. 

When it is no longer possible to match groups in terms of re- 
gressed scores as indicated in the preceding paragraph, it may still 
be possible to make some allowance for regression effects, making use 
of formulas (3) and (5) of this paper. Some of the statistics called 
for in these formulae may not be available, but even where it is not 
possible to compute the allowance precisely, it may be estimated on 
the basis of reasonable assumptions about the populations statistics. 








102 PSYCHOMETRIKA 


Such an estimate will probably yield a much less biased result than 
the raw experimental differences. 

None of the expedients suggested in this paper seems wholly sat- 
isfactory. What is really needed is some adaptation of the analysis of 
covariance to make it applicable to groups taken from populations 
having different regressions for the traits being studied. It is to be 
hoped that such an adaptation may soon be supplied to research 


workers. 














an 


it- 
of 


e 


a eee 








PSYCHOMETRIKA—VOL. 7, NO. 2 
JUNE, 1942 


UNIQUE TYPES OF ACHIEVEMENT TEST EXERCISES 


MAX D. ENGELHART 
DEPARTMENT OF EXAMINATIONS, CHICAGO CITY COLLEGES 


In this article are presented a number of unusual achievement 
test exercises of both the essay and the objective types. These exer- 
cises may suggest to others engaged in the construction of achieve- 
ment tests certain forms which they may find useful either as mod- 
els or as points of departure in the invention of new forms. The 
article also calls attention to certain problems which must be solved 
if achievement testing is to have a sound, scientific basis. 


All examiners engaged in achievement testing know that it is a 
part of the folklore of the teaching profession that objective tests 
measure merely the extent to which information is remembered, while 
essay tests measure much more worth-while things. The widespread 
acceptance of this belief is in itself indicative of the need for further 
research to determine what abilities are actually measured by various 
types of objective and essay exercises, and what instructional condi- 
tions or scoring procedures are prerequisite to the measurement of 
the abilities thus identified. It is possible that the types of essay and 
objective exercises ordinarily used and as ordinarily scored do meas- 
ure nothing more than the extent to which facts are remembered. 
Most teachers use essay questions which imply purely factual answers 
and, in scoring the responses of their students, look for nothing more 
than the facts which are mentioned. Similarly, the objective exercises 
usually prepared may merely sample more extensively the informa- 
tion retained by the student. 

It is possible, however, to construct essay and objective exercises 
which will measure abilities not restricted to memory, and the suc- 
cessful accomplishment of this end depends in part upon the nature 
of the exercises constructed and in part upon the conditions of in- 
struction and scoring. If exercises are to measure more than the mem- 
ory of facts, the exercises must present novel, problematic situations. 
The factor of instructional conditions is mentioned because an exer- 
cise may measure the ability to do discriminative thinking as the re- 
sult of one type of instruction, but may measure only the extent to 
which information has been memorized in the case of another type 
of instruction. The instruction may have functioned as coaching for 
the particular exercise. 


103 











104 PSYCHOMETRIKA 


It should not be assumed, however, that memorization of facts is 
not worth while or that the efficient measurement of the ability is not 
important. The acquisition of factual information is basic to the ac- 
quisition or functioning of the ability to do the thinking involved in 
solving novel problems. The ability to organize knowledge must be 
accompanied by the ability to assimilate knowledge. The development 
of desirable attitudes, ideals, and interests may involve experience in 
applying knowledge to various situations. Measurement of the ability 
to think and of the possession of the less tangible traits just named 
may be accomplished to some extent indirectly through the correlation 
of abilities even though the supposed function of the test is merely 
that of the measurement of factual knowledge. Research is needed to 
determine the extent to which reliance can be placed upon such in- 
direct measurement. 

It is held that tests used in factor analysis studies should be 
homogeneous, since homogeneity is essential in the identification of 
primary abilities. Where the problem, however, is the determination 
of what abilities are actually measured by achievement tests, it would 
seem reasonable to suppose that some degree of heterogeneity may be 
permitted. It would be useful to discover the extent to which achieve- 
ment exercises of different kinds and of different degrees of heter- 
ogeneity require the functioning of the primary abilities which have 
been identified. The achievement exercises may be a part of a bat- 
tery which includes those psychological tests best suited to the meas- 
urement of the primary abilities. It is possible that such studies will 
need to be conducted with consideration of the instructional condi- 
tions preceding the administration of the tests, for it may be found 
that the factorial composition of achievement tests is influenced by 
differences in instructional conditions. Factor analysis studies of the 
kind suggested would be a more fruitful means of determining the 
validity of achievement exercises than investigations of validity 
which involve the correlation of a given test with some criterion of 
uncertain validity. Such studies may also lead to the identification of 
methods of instruction most likely to develop certain desirable abil- 
ities. More reliance could be placed on the results of experimental in- 
vestigations of the relative effectiveness of differing instructional 
methods and materials if the validity of the achievement tests used 
in such experiments had previously been established by factor analy- 
sis research. 

It has been mentioned that essay exercises should represent nov- 
el, problematic situations if the responses of the student are to be 
based upon more than the functioning of the information he has 
remembered. This purpose may be accomplished by exercises of the 

















settee, 


—y 











MAX D. ENGELHART 105 


types given below. Blank lines followed each of these exercises in the 
lithoprinted examination booklets. 





The following selection from 
the writings of Descartes contains 
his explanations of certain phe- 
nomena. State which of his expla- 
nations are still accepted as valid, 
and indicate which of his explana- 
tions are no longer accepted, point- 
ing out briefly how they have been 
modified. “Those who have ac- 
quired even the mininum of medi- 
cal knowledge know how the heart 
is composed, and how all the blood 
in the veins can easily flow from 
the vena cava into its right side 
and from thence into the lung by 
the vessel we term the arterial 
vein, and then return from the 

‘ René Descartes (1596-1650) Jung into the left side of the heart, 

ortrait by Frans Hals, Louvre, Paris by the vessel called the venous ar- 
tery, and finally pass from there into the great artery whose 
branches spread throughout all the body. . . . . We know that 
all movements of the muscles, as also all the senses, depend on the 
nerves, which resemble small filaments, or little tubes, which all 
proceed from the brain, and thus contain like it a certain very subtle 
air or wind which is called'the animal spirits. .... So long as we live 
there is a continual heat in our heart, which is a species of fire which 
the blood of the veins there maintains, and this fire is the corporeal 
principle of all the movements of our members. ... . Its first effect 
is to dilate (expand) the blood with which the cavities of the heart 
are filled; that causes this blood, which requires a greater space for 
its occupation, to pass impetuously from the right cavity into the ar- 
terial vein, and from the left cavity into the great artery; then when 
this dilation (expansion) ceases, new blood immediately enters from 
the vena cava into the right cavity of the heart, and from the venous 
artery into the left cavity. ... The new blood which has entered into 
the heart is then immediately afterward rarefied (expanded) in the 
same manner as that which preceded it; and it is just this which 
causes the pulse, or beating of the heart and arteries; so that this 











106 PSYCHOMETRIKA 


beating repeats itself as often as the new blood enters the heart. It 
is also just this which gives its motion to the blood, and causes it to 
flow ceaselessly and very quickly in all the arteries and veins, where- 
by it carries the heat which it acquires in the heart to all parts of the 
body, and supplies them with nourishment.” 


-*e o*es =? -° > 
. . o™= aad mee 


The cartoon strip reproduced below portrays an episode in the 
life of Andy Gump. Write a paragraph in which you tell how the 
physiologist would explain Andy’s reaction to the presence of the bear. 





THE GUMPS—BEARING UP 














Information is needed for the writing of a correct response to each 
of these exercises, but it may be hypothesized that the quality of the 
response will also depend upon the extent to which the student criti- 
cally analyzes the situation and thoughtfully organizes his informa- 
tion in an effort to meet it. The scorers should be sensitive to more 
than the correctness of the information presented by the student. Their 
ratings should be based not only on the correctness of the facts, but 
also upon evidences of superior selection, evaluation, and organization 
of the facts presented. More research is needed to determine how such 
a rating can best be accomplished. Comparison of student responses 
with a psychometrically constructed scale of responses to the same 
or to a similar question may be found to be an effective procedure. 
Another possibility is the use of directions for scoring in which such 
characteristics as organization and originality are defined and illus- 
trated, and suggestions are made with respect to the weights to be 
given to each such characteristic. 

* Certain essay questions of the social science examinations of 
the Chicago City Junior Colleges have called for the interpretation 
or criticism of timely political cartoons reproduced in the examina- 
tion booklets. In a physical science examination one novel essay ex- 
ercise had to do with the Ptolemaic and Copernican theories. Archaic 
diagrams of the solar system relevant to each of these theories were 








pr 


it 
i 
f 
\ 
i 
1 








MAX D. ENGELHART 107 


presented, and the student was requested to discuss significant differ- 
ences. In examinations in English composition the students have been 
asked to write essays of from five hundred to one thousand words on 
the basis of notes presented on general topics. These notes were com- 
parable to those a student would assemble in reading about a given 
topic in different journals, books, or newspapers. One such essay con- 
cerned the movie Snow White and the Seven Dwarfs while this movie 
was at its height in public interest. The notes were taken from the 
reviews of the photoplay and were accompanied by pictures illustrat- 
ing how such a film is created by Disney and his associates. In one 
English examination the notes used were printed and the pages per- 
forated so that the student could sort and organize the notes prior to 
writing an essay based upon them. In an examination in the human- 
ities students were once asked how they would entertain certain his- 
torical personages if they could return to life and visit the students 
in Chicago. (One student stated that, since he had never heard of 
Gutenberg, that individual could stay dead.) In addition to constitut- 
ing novel situations, such exercises may, by their unusual character, 
establish better rapport. 

The following objective exercises are not unusual in form. Their 
brevity makes possible the sampling of a considerable range of infor- 
mation. A study of their content will support the claim that more 
than a knowledge of facts may function in their solution. 





37. Television, or the transmission of pictures by radio, differs from 
ordinary radio transmission in the use of the (A. antenna; B. 
audion tube; C. photoelectric cell; D. amplifier; E. condenser.) 


' 1 : 
388. Given n = an * If M remains constant and F is increased 


to four times its original value, while J is doubled, » is (A. un- 
changed; B. doubled; C. half as great; D. four times as great; 
E. eight times as great.) 
39. The part of a camera which is analogous to the retina of the eye 
is the (A. shutter; B. film; C. lens; D. bellows; E. diaphragm.) 
40. An alternating current generator differs from a direct current 
generator in that the former, but not the latter, has (A. an arma- 
ture; B. field magnets; C. a commutator; D. a drive shaft; E. 
slip rings.) 
The series of exercises from which those given above were taken 
was preceded by the following simple statement of directions: “After 
each item number on the answer sheet blacken the one lettered space 











108 PSYCHOMETRIKA 


which designates the correct answer.” The lettering of the spaces on 
the answer sheet A, B, C, D, and E has promoted simplicity of direc- 
tions. All answer sheets are scored by the electric scoring machine 
of the International Business Machines Corporation. 

The items given below are preceded by the directions used with 
the series in which they appeared. Each series represented an effort 
to measure knowledge of cause and effect relationships, or the ability 
to apply such knowledge: 





The following exercises pertain to cause and effect relationships. 
After each number on the answer sheet which corresponds to an ef- 
fect, blacken the one lettered space which designates the cause. 
(Choose from the causes listed below.) After each number on the 
answer sheet which corresponds to an organ or substance, blacken the 
one lettered space which designates the organ or substance participat- 
ing in the cause and effect relationship. 


Causes 
Insufficient secretion of a hormone. 
Absence of a substance from the diet. 
Excessive secretion of a hormone. 
Injection of a substance. 
Tying off of a duct. 


HOOD > 





68. Effect: Tender gums, loss of weight, and pallor—a condition 
called scurvy. 

69. Organ or substance: (A. pancreas; B. liver; C. Vitamin C; D, 
Vitamin D; E. iodine.) 

70. Effect: Passive immunity to diphtheria. 

71. Organ or substance: (A. diphtheria toxin: B. diphtheria bac- 
teria; C. diphtheria toxoid; D. diphtheria toxin-antitoxin; E. 
diphtheria antitoxin.) 


. + . . . 
- - - - ae ~te ~te 
. . . . 7 


The statements below represent the effects of certain causes. 
After the first of the two numbers following the statement, blacken 
the’ lettered space which designates the cause. After the second num. 
ber blacken the lettered space which designates the rock, mineral, or 
geological formation participating in the cause and effect relationship. 





Th 

















MAX D. ENGELHART 109 


Statement: Igneous rocks formed at or near the surface contain small 
crystals. 

67. (A. Slow cooling; B. heating by day and cooling by night; 

C. chemical action of the atmosphere; D. rapid cooling; E. 
mechanical action of the atmosphere) 

68. (A. Granite; B. rhyolite; C, quartz; D. feldspar; E. mica) 
Statement: A certain rock changes to brilliant, snow-white marble. 

69. (A. Powerful forces of extensive crustal movement; B. 

faulting; C. action of circulating ground water; D. rapid 
cooling; E. oxidation, hydration, and carbonation) 

70. (A. Granite; B. andesite; C. shale; D. quartzite; E. lime- 

stone) 

The following exercises are illustrative of a variety of ways in 
which it is hoped that objective measurement can be made not only 
of knowledge, but of the ability to think critically and discrimina- 
tively with such knowledge. (In each case only a few of the items 
are listed.) 





The number preceding each item in the exercise below refers to 

the corresponding number on the answer sheet. Blacken space 
A if the item is true of sound. 
B if the item is true of light. 
C if the item is true of both sound and light. 

101. Velocity in water independent of frequency. 

102. Velocity in vacuum independent of frequency. 

103. Electromagnetic. 

104. Intensity inversely proportional to square of distance from 

source. 

The numbers preceding the items in each exercise below refer to 
the corresponding numbers on the answer sheet. Follow the specific 
directions given for each exercise. 

Blacken space 

A if the statement is true of picture I. 
B. if the statement is true of picture II. 
C if the statement is true of both. 

D if the statement is true of neither. 

206. Representative of the early nineteenth-century (Barbizon) 

school of painting. 

207. A good example of eighteenth-century French painting. 

208. Dominated by the style of David. 

209. Artificial rather than realistic in its treatment of nature. 











110 PSYCHOMETRIKA 


210. A faithful depiction of nature with realistic detail. 
(The pictures were reproduced on the page facing the ex- 
ercises. ) 


The numbers preceding the paired items in the exercise below 
refer to the corresponding numbers on the answer sheet. Considering 
each pair from the standpoint of quantity, blacken space 

A if the item at the left is greater than that at the right. 
B if the item at the right is greater than that at the left. 
C if the two items are of essentially the same magnitude. 





3 
2 
gq PLANEI PLANE I 
coca i res % 


Two spheres, X and Y , of equal masses and radii are placed on two inclined 
planes, as shown in the diagram. Neglect friction and air resistance, and assume 
that potential energy is measured from the level of points L, M,N, and QO. 


70. Potential energy of X at F—Potential energy of Y at H. 

71. Potential energy of X at M—Potential energy of Y at N. 

72. Potential energy of X at M—Potential energy of X at L. 

73. Kinetic energy of X on rolling to L—Kinetic energy of X on 
falling to M. 

74. Kinetic energy of X on rolling to L—Kinetic energy of Y on roll- 
ing to O. 

75. Work done on X in raising it from M to F—Work done on X in 
moving it from L to F. 

76. Work done on X in raising it from M to F—Work done on Y in 
raising it from N to H.* 





It is frequently very effective to employ sequences of related 
items. Such sequences are illustrated below: 





155. Comb jelly and flatworms have, but hydra does not have (A. 
nerve cells, B. ectoderm, C. mesoderm, D. endoderm, E. proto- 
plasm.) 


* Other items of the series involved comparisons with respect to acceleration, 
time, loss or gain in potential or kinetic energy, power, force, mechanical advan- 
tage, and mechanical efficiency. The exercise as a whole requires the application 
of numerous principles of mechanics. 


























MAX D. ENGELHART 111 


156. The molluscs and the annelids have, but flatworms do not have 
(A. mesoderm, B. simple brains, C. bilateral symmetry, D. 
nephridia, E. muscle cells.) 

157. The arthropods and the annelids have, but flatworms generally 
do not have (A. bilateral symmetry, B. organs of reproduction, 
C. a coelom, D. muscle cells, E. nerve cells.) 

“All ideas are products of experience, or of reflection on experi- 
ence. Sensations when given meaning are perceptions. Associations 
of perceptions, or of simple ideas, leads to complex or abstract con- 
ceptions whose original source is still experience.” 

3. This quotation best represents the point of view of (A. posi- 
tivism; B. rationalism; C. idealism; D. pragmatism; E. em- 
piricism. ) 

4. This point of view was first clearly argued by (A. Descartes; 
B. Locke; C. Spinoza; D. Hume; E. Berkeley.) 

5. The popularizer of pragmatism, (A. George Santayana; B. 
Henri Bergson; C. William James; D. Josiah Royce; E. Au- 
gust Comte), would have added that the truth of ideas thus 
acquired is to be tested by their consequences in action. 

6. In consequence of the point of view expressed in the quota- 
tion, (A. Bacon; B. Hume; C. Locke; D. Berkeley; E. Des- 
cartes) claimed that the external world is not material. 

7. The position of the person referred to in the preceding item 
is best characterized as (A. nominalism; B. idealism; C. ra- 
tionalism; D. hedonism; E. positivism.) 

8. Asa further consequence, (A. Hume; B. Spinoza; C. Kant; 
D. Leibnitz; E. Hegel) claimed that neither matter nor mind 
could be proved to exist. 

9. The position of the person referred to in the preceding item 
is best characterized as one of (A. transcendentalism; B. 
positivism; C. realism; D. hedonism; E. skepticism.) 

10. In an effort to overcome the dilemma (A. Plato; B. Kant; 

C. Comte; D. Fichte; E. Dewey) proposed the theory that 
knowledge, originating in experience, is organized accord- 
ing to forms imposed by the mind on the material of sensa- 
tion. 


It is often desirable in achievement testing, particularly when 
the test is to be used as a measure of aptitude for a given field, to base 
the items on selections taken from texts of the type that the student 
may later be expected to study. Statements related to such selections 
may be classified by the student according to the following categories: 








112 PSYCHOMETRIKA 


After the number on the answer sheet corresponding to that of each 
statement blacken space 


A if the information given in the selection is sufficient for a judg- 
ment that the statement is definitely true. 

if the information given in the selection is sufficient only to 
indicate that the statement is probably true. 

if the information given in the selection is sufficient for a judg- 
ment that the statement is definitely false. 

if the information given in the selection is sufficient only to in- 
dicate that the statement is probably false. 

if the information given in the selection is not sufficient to in- 
dicate any degree of truth or falsity in the statement. 


a &§ SS ®& 


. -* . Pu Pie 
-i- ‘= -~\- ‘- = 


After the number on the answer sheet corresponding to that of each 
statement blacken space 


A if the statement is true, and its truth is supported by informa- 
tion given in the selection. 

B if the statement is true, but its truth is not supported by infor- 
mation given in the selection. 

C if the statement is false, and its falsity is shown by informa- 
tion given in the selection. 

D if the statement is false, but its falsity is not shown by infor- 
mation given in the selection 


It will be observed that the second set of categories implies that 
the student is to use whatever relevant information he has and is not 
to base his judgment only on the information given. The first set of 
categories is more useful when the purpose of the test is the meas- 
urement of the aptitudes of individuals who have not had much, if 
any, previous training in the field. 

The exercise which concludes this article represents an effort to 
measure the ability to do scientific thinking. Research is needed to 
establish whether it is a valid test of this ability. One could include 
this and similar exercises in a battery of psychological tests which 
would contain, among other tests, those previously found to be sig- 
nificantly saturated with the inductive and deductive reasoning fac- 
tors. Another means of studying the validity of such exercises would 
be to administer the exercises to a criterion group, the individuals of 
which have been carefully rated with respect to their ability to think 
scientifically. 














MAX D. ENGELHART 113 


A more elaborate exercise of this type might involve (1) the 
statement of a general problem; (2) a paragraph or two descriptive 
of experimentation with respect to the problem; (3) the directions 
for marking the answer sheet ;* (4) several hypotheses, each followed 
by the statements to be classified with respect to it and an item call- 
ing for an indication of whether or not the given hypothesis is true 
or false, i.e., tenable or untenable; and (5) five general conclusions, 
of which the student is expected to choose the one which represents 
the best answer to the general problem, the one which represents the 
least satisfactory answer, and the three which are neither the best 
nor the least satisfactory. 

It should be noted that one cannot logically write statements 
which are justified contradictions when the hypothesis is tenable, nor 
justified supporting statements when the hypothesis is untenable. The 
complete set of categories may nevertheless be given. 

Such exercises follow, in their organization, the steps usually 
mentioned in descriptions of scientific thinking; hence, this type of 
exercise may be found useful as a teaching device when the purpose 
of the instructor is that of developing the ability to think scientifically 
or, in more conservative terms, that of imparting some knowledge of 
scientific method. When used as a teaching device, such exercises may 
also lead to the development of the attitude that scientific method is 
important. These purposes are usually stated among the objectives 
of science instruction. 





Hypothesis: Carbon dioxide (CO.) is a more potent factor in the 
control of breathing than oxygen (O.). 

Experiment: If air from a small closed chamber is breathed and 
rebreathed, and care is taken to remove all the expired CO, , the O. of 
the chamber will gradually be used up. The concentration of O, in the 
blood gradually diminishes, with no appreciable change in the blood 
CO, concentration. In such an experiment, breathing is accelerated 
relatively little, even though the experiment is carried to the point 
where the O, content of the blood is considerably reduced. 

However, if the same experiment is repeated except that the ex- 
pired CO, is not removed from the system but is allowed to accumu- 

*In the article cited below, an exercise of this more elaborate form is de- 
scribed in which the statements are classified according to whether their support 
of the truth or falsity of the hypotheses is direct or indirect. It seems to the 
author now that such classification is artificial, unimportant, and not at all clear 
cut. It is exceedingly difficult to secure agreement among experts with respect to 
the key. This difficulty was not encountered with an exercise of the same general 


plan in which the categories are of the “justified” or “not justified” type. 
Engelhart, Max D. and Lewis, Hugh B. An attempt to measure scientific 


thinking. Educ. psychol. Meas., 1941, 1, 289-294. 











114 PSYCHOMETRIKA 


late to be rebreathed again and again, a very marked acceleration of 
respiration, as well as extreme discomfort (“air hunger”), will result. 
In this experiment O, is being depleted from the blood as before, but 
CO, is accumulating. 

Finally, if an individual breathes air containing the normal, or 
even more than the normal, percentage of O., but containing only a 
slight excess of CO. , respiration will again be accelerated. Here the 
O. concentration of the blood has been maintained practically un- 
changed, and the CO, content has increased. 

—Adapted from Anton J. Carlson and Victor Johnson’s The Machin- 
ery of the Body. 





After the item number on the answer sheet which corresponds to that 
of each statement blacken space 
A if the statement supports the hypothesis, and this support is 
justified by the experimental data given. 
B if the statement supports the hypothesis, but this support is 
not justified by the experimental data given. 
C if the statement contradicts the hypothesis, and this contradic- 
tion is justified by the experimental data given. 
D if the statement contradicts the hypothesis, but this contradic- 
tion is not justified by the experimental data given. 
E if the statement is not relevant to the hypothesis, or if the 
statement neither supports nor contradicts the hypothesis. 


STATEMENTS 


1. An increase of one per cent in the concentration of oxygen 
in the air breathed has more effect on the rate of breathing 
than an increase of one per cent in the concentration of car- 
bon dioxide. 

2. Reducing the concentration of oxygen in the air breathed 
has less effect on the rate of breathing than increasing the 
concentration of carbon dioxide. 

3. Carbon monoxide combines with hemoglobin of the blood, 
while carbon dioxide does not. 

4. Because increased concentration of carbon dioxide in the 
blood prevents the blood from absorbing as much oxygen 
from the air as it normally should, breathing is accelerated. 

5. Holding the concentration of carbon dioxide in the blood 
constant while decreasing the concentration of oxygen re- 
sults in a slight increase in the rate of breathing. 











10. 


11. 


12. 





MAX D. ENGELHART 115 


The cells of the body require oxygen, but carbon dioxide is 
only a waste product of celi respiration. 

Since respiration ceases in air containing no oxygen, even 
though the normal amount of carbon dioxide is present, oxy- 
gen is a more potent factor than carbon dioxide in the con- 
trol of breathing. 

Since respiration would ultimately cease in air containing 
the normal amount of oxygen but no carbon dioxide, car- 
bon dioxide is a more potent factor than oxygen in the con- 
trol of breathing. ; 

Since a lack of oxygen in the blood is known to stimulate 
the production of red blood cells, oxygen is a more potent 
factor in control of breathing than carbon dioxide. 
Ordinary air contains about 21 per cent of oxygen but only 


.04 per cent of carbon dioxide. 


One breathes more rapidly up in the mountains where the 
air is more rarefied, because of the decreased concentration 
of oxygen even though the carbon dioxide in the air is also 
less concentrated. 

In serious cases of pneumonia, patients are placed in oxy- 
gen tents to promote breathing; but no effort is made to 
supply the patients with carbon dioxide. 


After the item number on the answer sheet which corresponds to that 
of each conclusion blacken space 


A 


B 


C 


13. 


14. 


15. 


if you believe that this conclusion is the most justified of the 
three conclusions stated. 

if you believe that this conclusion is the least justified of the 
three conclusions stated. 

if you believe that this conclusion is neither the most justified 
nor the least justified. 


CONCLUSIONS 


One should hesitate to accept the hypothesis since no ex- 
planation is given with respect to why the carbon dioxide in- 
fluences the rate of breathing. This aspect of the matter is 
worth investigating. 

The experiment shows that decrease of oxygen is less potent 
a factor in controlling the rate of breathing than carbon 
dioxide. 

The hypothesis can only tentatively be accepted until the ef- 
fect of decreasing the concentration of carbon dioxide is 
compared with decreasing the concentration of oxygen. 











PSYCHOMETRIKA—VOL. 7, NO. 2 
JUNE, 1942 


CONTRIBUTIONS TO THE MATHEMATICAL THEORY OF 
HUMAN RELATIONS: V 


N. RASHEVSKY 
THE UNIVERSITY OF CHICAGO 


Previously derived equations for the control of the behavior of 
one social group by another are developed further. Cases of inter- 
action of three social classes are studied, and the variation of such 
an interaction with time is investigated. A previously derived equa- 
tion for the ratio of urban to rural population is generalized, and a 
theory of interaction of industrial and agricultural classes is out- 
lined. Some of the theoretically derived equations are compared with 
available sociological data and found in fair agreement. 

In several preceding papers (1-6), we have discussed a mathe- 
matical approach to social problems. Most of the developments were 
based on the consideration of the case in which the behavior of a large 
social group is determined by the behavior of a relatively small frac- 
tion of that group. We considered all individuals in a society as being 
of two different types. The “active” type determines its behavior ac- 
cording to its own tastes, regardless of what others do. The “passive” 
type determines its behavior according to what most of the people 
with whom they come into contact do. The active individuals may 
themselves be subdivided into several groups, exhibiting different be- 
haviors. The problem then arises to determine the behavior of the 
passive individuals, as influenced by the conflicting active groups. We 
have treated that problem for the case of two active groups exhibiting 
mutually excluding behavior under two different assumptions (2). In 
one case it was assumed that the effort of the active individuals in 
influencing the passive ones is constant. In the other case we con- 
sidered that this effort decreases as the number of successfully influ- 
enced individuals increases. 

In the first case all the passive individuals exhibit one behavior, 
that of the “stronger” active group. With continuous variation of 
the strength of the two active groups there occurs at a certain point 
a discontinuous variation in behavior of the passive individuals. One 
active group “loses” the control to the other. The strengths of the 
active groups are, ceteris paribus, proportional to the number of in- 
dividuals in those groups. It has been shown (7) that similar situa- 
tions are obtained also for a more general case, when the activity and 


117 











118 PSYCHOMETRIKA 


the passivity of the individuals are graded so that there is a continu- 
ous transition from the most active to the most passive individual. 

In the second case the passive individuals exhibit a mixed be- 
havior, a fraction of them exhibiting one behavior, a fraction another, 
and the relative size of fractions varies continuously with the relative 
strength of the two active groups, provided certain inequalities be- 
tween the constants are satisfied. A continuous variation of the 
strengths of the two competing groups results in a continuous varia- 
tion of the relative numbers of individuals exhibiting the two behav- 
iors. The strength of each active group is again proportional to the 
number of individuals in the group. 


I 


Let us investigate the second case somewhat more in detail. Using 
the same notations as before (2), we have for the number z of pas- 
sive individuals exhibiting behavior A , the expression: 

Mo*Lo — Co'Yo — (A — Co*E'Yy) N’ 


s= wr 
o*eXo + CoE Yo — 2a ‘ 





with a corresponding expression for y. Hence 


L— Ay'o — Co°'Yo — (@ — Co°€'Yo) N’ 








es + 7? 1 
Y Co" Yo — Ay’Xy — (4 — Ay"EX) N (1) 
which may be written 
a 
do? ~°— [e,*(1 — e'N’) +N] 
x - Yo Yo (2) 
: : 


eo ~—N ae - Ne) 2 
Yo Yo 

The quantity 1 — «N’ must be non-negative if equation (24) and (25) 
of (2) have physical meaning. Similarly, x and y must be non-nega- 
tive. Since the denominator of the expression for x and of the corre- 
sponding expression for y is positive because of inequality (28) of 
(2), therefore both numerator and denominator of (1), and hence of 
(2), are positive. It follows that 


a a 
[c.°(1 — e’N’) ie and c,* ——N'>0. (3) 
0 0 
Hence if we keep y, constant, but increase 7), so as to increase 


Xo/Y , the ratio 2/y will increase continuously, but will become infinite 
when 

















N. RASHEVSKY 119 


x Ce im 5 - N' 

0 0 

 . ” 4 
Yo "(1 — eN’) (4) 
For this and higher values of x)/y., all passive individuals exhibit 
behavior A. On the contrary, when 








e*(1—e'N’) +2 
w= fe (5) 
Yo A" 
then «/y = 0. For this and smaller values of 2,./y» all passive indi- 
viduals exhibit behavior B . 

It may seem strange that equations (4) and (5) are not sym- 
metric. This is due to the fact that we vary x./y. for a fixed y.. We 
obtain corresponding expressions if we keep x, constant. 

Putting a = 10 ind./day, a.* = c,* = 10? ind./day, N’ = 10’ ind., 
Yo = 1:3 X 10° ind., and e= e’ = 0-8 X 107 ind.-’, we find that «/y = ©, 
when 2/Yo = 1-5, although x/y = 1, when x./y) = 1. With 2, and yo 
being only of the order of 1% of the total population, we see that a 
rather small change in the size of the active group may change the 
behavior of the population very appreciably. 

In previous papers (2, 8) we have seen that there are two possible 
causes for the changes in the sizes of relative strengths of the active 
groups. One cause, actually found in human history, is due to the 
formation of closed hereditary social classes (2). Due to the dissimi- 
larity of parents and offspring, such a hereditary closed class, which 
originally may be composed all of active individuals of a given type, 
will gradually be “thinned out” by passive individuals, born to active 
parents. This results in a general weakening of the activeness of the 
class as a whole (2). The other cause is the actual variation of the 
relative sizes of the different active groups, due to the differential 
birth and death rates. As has been previously shown (8) when we 
have at least three different types of individuals, e.g., two active and 
one passive, the relative numbers of individuals may fluctuate peri- 
odically. This will result in fluctuations in behavior, in other words, in 
the social relations of the whole population, although, as remarked be- 
fore (8), these fluctuations will not necessarily be periodical. The 
whole problem of social changes in a population thus becomes essen- 
tially a problem of biology of mutual interaction of different species, 
regarding each type of individual as a species. In sociology and his- 
tory changes with respect to time of ideologies, morals, tastes, etc. 
are frequently discussed. A quantitative mathematical description of 














120 PSYCHOMETRIKA 


such abstract, almost intangible things may well be impossible. But 
the problem changes radically when we speak of the changes in the 
relative number of the individuals professing given ideologies, morals, 
or tastes. A quantitative description of these changes is quite pos- 
sible. And this is precisely what the developments of our previous 
papers and of the present one are driving at. 


II 


We have previously considered two types of interaction of active 
and passive individuals. In one case the active individuals simply 
impose certain behavior patterns upon the passive ones (2). Except 
for sufficiently large coefficients of influence, this does not imply any 
special technical abilities on the part of the active individuals. On the 
other hand, we have also considered the case when the active class, 
due to special abilities and knowledge, organizes the activities of the 
passive one in such a way that these activities become much more pro- 
ductive than they would be without such organizing influence (3). We 
shall now consider another complex situation. Let there be altogether 
three classes: active class I, active class JJ, and passive class III. Let 
the number of individuals in corresponding classes be N,, N., and 
N,. Let 


N=N,+N.+N;. (6) 
The corresponding coefficients of influence we shall denote by a, , a, 


and a;. Let us consider the first type of interaction, that of constant 
effort. Let. 








a. Ao — a 
N.> ae Me ae 


N,. 
a, + a; a, + a; (7) 


This is essentially the inequality (8) of a previous paper (2), with 
changed notations. It expresses the condition for class I to control 
the behavior of the whole passive population. The general behavior 
pattern will be determined by the dictates of class I. 

Let class JJ on the other hand be an organizing class. If the or- 
ganization of the class JI by class IJ, in order to increase the produc- 
tion of some useful goods, will not interfere with the general behav- 
ior pattern imposed by class J, then class JJ will be permitted to pro- 
ceed with such an organization, provided a certain amount of the use- 
ful goods produced will be given to class I. 

Concerning the interaction between class JJ and class III several 
different assumptions may be made (3, 4). For sake of definiteness 











N. RASHEVSKY 121 


we shall consider here, as an illustration of the method only and with- 
out any prejudice in favor of it, the case mentioned in a previous 
paper (4, page 209). Denoting again by ¢@ the fraction of goods pro- 
duced which is given to class III, by w. and w; the amounts of labor 
given by class IJ and III, respectively, and assuming as before “de- 
mand functions” of the form 
b. bs 

+3" ie ’ (8) 
we find that the amount of goods received by class JJ, in the absence 
of any interference by class I, is 


We = Wo2 — 


be bs 
Ns f(n) (1 ~ 8) (toa — ==) (ts — 3) (9) 


where 
N2 
y— N,’ (10) 

and f(7) is the same as defined in (2). 

Let us consider, however, that class IJ, which controls the whole 
situation, will require a certain fraction a of everything produced by 
the cooperation of classes IJ and III. Thus class IJ will now retain 
only a fraction (1 — 6) (1 — a) of goods, while class J/I retains only 
6(1 — a). Those amounts should be introduced now into equation (9) 
instead of (1 — 6) and 6, respectively. Morover, it may be argued that 
the constants b. and b; are functions of f. For f measures the effici- 
ency of supervision, and the greater f , the larger the total amount 
produced. Thus the greater f the smaller the fraction of the goods 
produced which may be worth retaining. Hence we put 


b, =— b,=—. (11) 


Instead of (9) we now have 

b,’ bs’ 
ja waa" Wa—a 
Class II will fix 6 in such a way as to make the quantity 


b. b,’ 
(1 — 0) Nef (Woe —F(i—6) (=a) "fea —ay? (13) 


a maximum (4). For a fixed a, the value 6, of 6 which maximizes 
(13) is equal to 


). (12) 





N;f (1 = 6) (Woe re 














122 PSYCHOMETRIKA 











(w 2a a ) “ 
ped 7 —a) 74 —a), ‘ais 
Woz Wos 


On the other hand, class J, if it controls the whole population com- 
pletely, will fix a at a value a, in such a way as to maximize the total 
amount it receives, namely, 


a 
ja-OU-«)""” RG-s 


By substituting (14) into (15), we obtain the latter expression as a 
function of a only. By differentiating we then find the value of a, in 
terms of N;, f , Wor, Wos, 02’ and b,’. Substituting that value a, into 
(14) we find 6,, as a function of the foregoing six quantities. Once 
6m and a, are determined, we have the total rate of accumulation of 
goods by each of the three classes, according to equations set up be- 
fore (3, 4). Considering now changes of N; and N, with respect to 
time, and hence changes also of f, we can calculate the relative 
“wealth” of the three classes in its dependence on time. 

Relation (9), chosen as an illustration, does not allow any ex- 
pressions for 6@,, and a», in closed form. The same holds about other 
similar relations. More or less complicated approximate solutions of 
this problem are hardly worth while at this stage, since the assump- 
tions made are too crude to be applied to actual cases and since 
hardly any accurate data on the subject are available. As a second 
illustration we shall consider a somewhat different picture, which will 
be more familiar to the student of mathematical economics. 

Consider class JJ as a monopolist, producing an amount u of 
goods at a cost q(u), and let 


q(u) = Aw?+But+C, (A,B,C>0). (16) 
Let the goods be sold at a price p per unit, and let there be a demand 
function for the goods of the form 


y=b-—ap, a,b>0. (17) 





aN sf (1 — 0) (Woe — ). (15) 


In actual cases g(u) is composed of the cost of labor as well as cost 
of material, transportation, etc. We shall consider here, for simplic- 
ity, the fictitious case when q(u) consists only of the cost of labor. 
The situation, though unreal, is not quite impossible. If we consider 
the interaction of several industries, then there is an exchange of ma- 
terials between them, which enters into q(u). The cost of material 
for one industry amounts ultimately, however, to the cost of labor in 
another industry, which produces it. If we therefore consider a sort 











N. RASHEVSKY 123 


of average for all the industries, denoting by u the amount of goods 
produced by all of them, then g becomes eventually the cost of labor 
only. This of course introduces the question of whether we can meas- 
ure the products of different industries in common units. Inasmuch 
as we are discussing here only a theoretical case, we need not worry 
further about the assumption made. 

Let class J impose a tax é per unit of goods produced. It receives 
altogether an amount &u of money, which in terms of goods is equiva- 
lent to u/p units of goods. Class IJ produces u units of goods. It 
pays for those goods to class III the amount qg(u), which is equivalent 
to q(u) /p units of goods. Moreover, it pays to class I an equivalent of 
éu/p units of goods. It retains u — q(u)/p — &u/p units. Class III 
gets an equivalent of q(u)/p units of goods. The case, though at first 
sight very different from the one discussed before, can be thus ex- 
pressed in similar terms as the former. We have 


Se tae eo. (18) 
U U 


The relations between 6 and a assumed here are different, however. 
If class IJ adjusts the production so as to maximize its profits, 


we have (9, p. 50) 








_b—-(B+ §)a 

oe ~O4+2Aa ” (19) 
_b+2Aab + (B+é)a bi 
P ~~~ 9a(1 + Aa) anid 


Consider the case that class J determines the taxation in such a way 
as to make the amount éu/p a maximum. Equations (19) and (20) 
give us the amount &,, of such a tax. The expression for &,, in that 
case can be obtained in closed form. However, it is rather clumsy and 
unusable. We may consider an alternate hypothesis, namely, that 
class J fixes so as to maximize not the equivalent amount &u/p of 
goods received, but the monetary rate éu. 

We thus must consider in terms of money also the amounts re- 
tained by class IJ, namely, pu — q(u) — &u, and by class JII, namely, 
q(u). This gives 





Pe A (21) 


2a” 


which is positive, since b — Ba is greater than zero (9). We also have 











124 PSYCHOMETRIKA 


(b — Ba)? 
8a(1+aA)~ 


To connect, now, these expressions with the problem of the variation 
of the socio-economic structure with time, we must express the quan- 
tities A, B, C, a, and b in terms of N,, N., and N;. It will be 
rather natural to assume that a and b are proportional to N or N;, 
since N, << N; and N, << N;. The demand will approximately double- 
ing the number of consumers, ceteris paribus. Concerning A, B, 
and C , we may consider that the greater class JJ, in other words, the 
more people there are capable of supervising the work, the less will 
be gq. With an increase of N., the number of inventors will also in- 
crease, facilitating production and therefore reducing q , since inven- 
tors come from class JJ. Assuming, only as an illustration, that 
1 1 


1 
ne tho dase 
<< An Ape % 


aQaN;,,b« N;, 


(gu) max = 


(22) 


(23) 


let us first consider the simplest possible case, that the total popula- 
tion increases, but that the ratio N./N; remains constant. Then we 
see from equation (21) that &,, will first increase, then tend asymp- 
totically to a constant value. For very large N; , u will approximately 
grow as N. , as do p and q, hence we have 


fu«cN;,quaN3;,uaN:,,pa«N;. 


The ratio u/ (pu — q — €u) of the amounts retained by classes J and II 
will therefore vary, with larger N’s, as 1/N; « 1/N. With increas- 
ing population the rate of accumulation of wealth by class J will be 
decreased as compared with that of class IJ. The relative wealth of 
the two classes will also decrease. It must be noted that the ratio 
Eu/ (pu — q — &u) may be initially either greater or less than 1/2, de- 
pending on the values of the constants. Thus the controlling class 
need not necessarily be the wealthiest. In any case, however, under 
the assumptions made here, the controlling class will become gradu- 
ally relatively poorer as compared with the class JI. 

We have assumed that N./N. remains constant as N increases. 
This means (2) an unrestricted social mobility between class IJ and 
class III. This is what probably happens actually for the class of peo- 
ple characterized by some special abilities of technical and scientific 
nature. If we assume that between class J and class III the social mo- 
bility is small, then, as has been shown elsewhere (2), after a lapse 
of time inequality (7) will cease to hold, and the control will pass to 











N. RASHEVSKY 125 


class JJ. An expression has been derived for the moment ¢* when this 
will happen. For t > ¢t* the relations discussed in this section do not 
hold, for class IJ does not need to give anything to class J. Hence, if 
at the moment ?¢’* class J is still relatively wealthy, then at t = ¢* there 
will be a sudden drop in the relative wealth of classes J and IJ. On 
the other hand, at t = t* the first class may already be sufficiently less 
wealthy than the second, in which case the discontinuity will be either 
less pronounced or will be altogether absent. 

In deriving the expression for t* we assumed the coefficients @, , 
a, and a; as constant (2). Actually these coefficients will increase 
with increasing wealth of the corresponding classes, because they in- 
crease with the amount of technical facilities for communication avail- 
able. Thus by making a, , a, and a; functions of time, through their 
functional relation to the total amounts of wealth available, we shall 
have a more complex equation for t*. This is a problem of interest, 
requiring a special investigation. 


III 


We may also consider the whole problem of interaction of classes 
I and IJ from a different angle, as put forth in a recent paper (6). 
We may consider that while class JJ supplies some material goods, 
necessary for life, class J provides for goods of nonmaterial character, 
which are, however, also necessary to the community. This may be 
the organization of military defense, legislation, policing, etc. Class 
IT may agree to supply class J with a definite fraction of its products 
provided class J in its turn supplies it adequately with the useful re- 
sults of its activities. The question may arise as to the units in which 
we shall measure the “goods” supplied by class J. While it may be dif- 
ficult to give a general definition of such units, for all specific in- 
stances the problem is solved in daily life. A public officer is dis- 
charged when he does not perform his duties adequately. In other 
words, he does not receive in that case the remuneration which would 
be forthcoming to a competent person. In all such cases we have an 
exchange of nonmaterial goods against material. It is true that in 
some cases, as for instance in the case of a business executive, the per- 
formance of his duties may be directly measured by the amount of 
dollars of profit which it gets for the business. In other cases, how- 
ever, as for instance in the case of a judge or a policeman or an army 
officer, this is not possible. In every individual case, however, a def- 
inite criterion for measuring indirectly the dollar value of some ac- 
tivities does exist, and this defines a practical unit of such an activity. 

In general, we must consider that each class may produce both 











126 PSYCHOMETRIKA 


goods, but one class produces predominantly one type of goods, the 
other class the other type. Denoting, as in (6), by a,, 4, b,, b2 the 
amounts of the two goods produced by each class and by 71, %2, Y1, 
and y, the actual amounts possessed by the corresponding classes after 
the exchange, we have, using the same notation as in (6), the follow- 
ing equations for the determination of x, , 22, Y:, Y2 (9, p. 128): 


Xi (4%, Y1). 
X_(@, + a, — 2,6, + b2 — y:) 








25 
a Y3(%1, Y1) wei 
~ ¥,(a, + a, — 2,6, + bs — 9) 
sd, Xx. 1,541 
y eT: (2; a) (26) 
%, — A, Yi (21,41) 
t+ %=a,+Q, 
(27) 


Y¥,+4.=—b,+b.. 


Following L. L. Thurstone (10), we shall choose as an illustration for 
the satisfaction functions s, and sz the following expressions 


8,—A, loga,« + B, log fy, 


S82 — A, log a. + B, log p.y. 


For sake of simplicity we shall first put A, = A. = B, = B, = 
a, = B, = a, = fp. = 1, so that s, = 83, = 8; in other words, we shall 
consider first the satisfaction function as the same for all individuals 
and study only the effect of different ability to produce a given good 
or service. 





We have 
S,:=N,s,S,=N.8, (29) 
while X, , Y; , X., and Y, are defined by 
Se, ey, Ea, (30) 
Ox oy Ox oy 


Equations (25)-(27) now give: 








— 20,b, + @,b2 + 2b, sis Saas 2d.b, + ab, + ab, 
: 2(b, + be) 7 2" 2b, + be) 
ee ee ee ee 








Y2= 


™ 2(a,+a,)  ” 2(a, + a) 











N. RASHEVSKY 127 


Introducing 
b 
tay, Soy, (32) 


we have 
2, __ 2a,b, + a,b. + ab, Un __ 2b + 64 + 6, 


2. 20ob, + 0,6, +G9b, Yo 2b. + d.A+d, 














(33) 
2d + Agu + ay 
Qe + dou + a, ° 
We also have 
7] HOT ” 2(b, % be)? 
aa ‘x, ~ (2b +b, + bed)?’ 
(34) 


0 4, 2(a, + a2)? 


iu a. rm (2a. + a, + ad)? “iia 





Let now x represent the amount of legislative or military service pro- 
vided by class J, while y represents the amount of material goods sup- 
plied by class IJ. Let us consider, as before, that while class I follows 
the principle of “class heredity,” class IJ is characterized by a high 
mobility. Denoting, as before (6), by a,’, a’, b,’, and b.’ the produc- 
tions per individual, we have 


G,’ >> bia << ke >> ey, Bb << bY. (35) 
For @, , a., b, , and b. we have: (6) 


A, = NO,’ + 2"Ae', Aa = N'A," + Nj"0,', 

b, = nf'by' +n"b.', 0g = Ny'by' + Ny"b-’. _— 
With the foregoing assumptions, n,'/n;" decreases with time, accord- 
ing to equations developed previously (2), while 7!,,/ ,;"" remains con- 
stant. On the other hand all n;* increase. Thus a, , a, b, , and b. will 
all increase. But, because of (34), a, and b, will increase more slowly 
than a, and b,, so that a,/a, and b,/b. decrease. Hence, according to 
equations (32) and (34), #,/22. and y:/¥Y2 will decrease. In other words, 
while class J receives the lesser fraction of goods from class IJ, at the 
same time the relative amount of legislative privileges of class II as 
compared with class I increases. In Figure 1 the variations of 21/22, 
as given by equation (33), are shown for the following choice of con- 
stants: a,’—=10a,', b./b, = 1.06. The rates of increase of the popu- 
lation for all classes are assumed the same, as in (2), with 6 = 0.062 
years. The percentage a, of offspring of type J (2) is taken as 0.036. 











128 PSYCHOMETRIKA 


For comparison, points are plotted in Figure 1, taken from data 
compiled by P. Sorokin (12). In collaboration with a number of other 
economists and sociologists, Sorokin has made tentative estimates of 
the economic conditions for different social classes in different coun- 
tries. The estimates are based on a relative rating scale, ranging from 
1 to 10. As Sorokin points out, the individual curves do not show any 
definite trends. If we take the ratio, for instance, of the values of the 
nobility class and the bourgeoisie in France, we find a definite down- 
ward trend as seen from Figure 1. 

It may be legitimately questioned as to whether this ratio can be 
identified with our x,/x.. Such an identification is justified only as a 
rough approximation. The relative economic conditions of the two 





“/% 


3} 











i 1 i 1 iL i i 1 i i 
800 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 


Ficurs 1 





classes at a given moment are not functions of only 27,/2x, at that mo- 
ment, but also of the values of x,/x. at previous times, since wealth 
can be accumulated. Roughly, however, we may compare the relative 
economic conditions of two classes with x,/z,. The bourgeois class 
was chosen as representing class JJ, which is also only roughly true. 
No definite conclusions should be drawn from Figure 1, which is given 
merely as an illustration of how some theoretical conclusions of the 
type discussed above may be compared to sociological data, if proper 
data are available. 

One could in principle apply similar considerations to the varia- 
tion of civil laws with time, which may characterize the different 
rights, privileges, and obligations of different classes and may be 
identified with y, and y,. Quantitative data of that type are very 
difficult to obtain. It must be noted, however, that P. Sorokin (11) 
has made an interesting attempt to determine quantitative indices for 
the variations of criminal laws. An extension of Sorokin’s method to 
civil laws would be very desirable. 











N. RASHEVSKY ‘429 


Using expression (28) without any restrictions upon the coeffi- 
cients, we obtain similar, though more complicated relations. 

At this point it must be very strongly emphasized that the fore- 
going suggested mathematical theory of legislative change does not 
constitute an “economic interpretation” of social phenomena and does 
not commit us to such an interpretation. Neither does it commit us to 
any other special sociological doctrine. At first glance one might think 
that the shift of legislative tendencies, expressed by y./y., is the 
result of the shift of the ratio x,/x,, which is an economic quantity. 
The true cause of the variation is, however, the “thinning out” of 
class J, which results in a loss of a number of capable legislators in 
it. The parallel variation in x,/x, and y./y2 is a consequence of this 
“thinning out” plus the special assumption about the satisfaction 
functions. With somewhat more general assumptions about these 
functions, we shall in general find 2,/x%. # y:/y2, though the two 
ratios may exhibit a certain parallelism in their variations. 


IV. 


In a previous paper (3) we have derived an equation for the 
ratio of urban population N, to the total population N = N,+N,, 
where the N, is the rural population. We considered then two different 
criteria as possibly determining this ratio. One was the tendency of 
each individual to migrate either to the city or to the country, depend- 
ing on which place offered a greater income. The other, a more 
theoretical one, was the adjustment of each individual in such a way 
as to make the total per capita income of the whole society a maximum. 
Both lead to an expression 

Nu C? 
— 1 nN’ (37) 
the constant c? being different for the two cases. 

A slightly different equation is obtained by generalizing some- 
what our assumptions. Whereas in (3) we considered for the per 
capita production of rural inhabitants an expression of the form 


a; 


OW Fe,” a]>0, a>0, (38) 





while we set p, = Const., we may for reasons similar to those in (3) 
now set 


ne 
a + 


>t, &>Os (39) 











130 PSYCHOMETRIKA 


Using the first criterion and therefore as in (3) setting p,=—D., Wwe 
find in a similar way as before 








N B 
“=A-—, 40 
with 
b, a,b. = ab, 
—— — = ————__., 41 
a,+b, ’ a, +b, ioe 


While according to (37) N./N tends with increasing N to unity, ac- 
cording to (40) it tends to A <1. 


60 





40F 


.30F, 





-20F 








—>N IN 10° 





10 
50 60 70 60 90 100 110 120 


The curve represents the ocular anid 40 with A = 4722: B = 23.1. 
The points represent actual data for the United States. 

Figure 2 shows the comparison of equation (40) with data for the 
United States (13, p. 227). For Germany the simpler equation (37) 
represents the facts well (14, p. 14). The latter requires that N, = 
Const., all increasing population going into cities. This is the case with 
Germany, since between 1871 and 1933, the rural population remained 
practically constant, fluctuating between the extremes of 21623 X 10* 
and 22709 X 10°. It may be asked why available data for the ratio of 
N./N in the United States for a period from 1800 were not used. The 
reason is this: the constants a, and a, are by their definition (3) 
functions of the total area of the country, and the latter gradually 
increased in the United States during the first three quarters of the 
last century. A more detailed theoretical study is required to be appli- 
cable to such a case. 











N. RASHEVSKY 131 














.70 
Nu/W 
e 
60 
50 : 
40, 
e 
ah 
“ —N IN 108 
1 1 l i 1 
‘30 40 50 60 70 


FIGURE 8 
The curve is the graph of equation 37 with c? = 22. The points represent 
actual data for Germany. 


V. 


Formally, somewhat similar considerations lead us to a theory of 
interaction between the industrial and agricultural population of a 
country, again of course under rather oversimplified assumptions. 

Let N; denote the number of individuals working in industry, N, 
that in agriculture. Let the total population be 


N=N,+N,. (42) 


Denote by pi, 2, respectively, the amount of goods produced per 
capita in industry and agriculture per unit time, expressed in some 
comparable units, for instance, in their dollar value. Both p; and p, 
will be functions of N; and N,, as well as of the supply of natural re- 
sources of the country. Those very roughly can be divided into the 
total area of S of land available, and the resources R of ores, minerals, 
ete. Thus 


pi=fi(Ni,N.,R,S), Pu=—fi(Ni,N.,R,S) . (43) 


Let c; and c, be the per capita consumptions in industry and agricul- 
ture, respectively, per unit time. Let also an exchange of goods be- 
tween the two groups of individuals take place, so that the agricultural 
individuals supply the industrial with an amount N, G, of goods, re- 
ceiving in return a fraction 6 of industrial goods. The quantities G, 
and 6 will be connected by some kind of demand equation, so that 


G.=f(0). (44) 











132 PSYCHOMETRIKA 


For the rate of change of the total wealth W; and W, of the industrial 
and agricultural population we have (3, 4) 








=N; (pi — Ci) = is N.Ga a ON.G. ’ 
(45) 
re =N,(Pa — Ca) — NaGo + ONiGa « 
The per capita rates w; and w, are 
N-N;j 
W;,=p;— 0, + 7 , 
(46) 


Wa =D, — Co — Ga(1 — 6). 


If every individual chooses his occupation in industry or in agriculture 
according to whether w; or w, is larger, then there will be a shift of 
population to industrial occupations if w; > w, and to agricultural if 
WwW, <w,. In equilibrium we have 


Wi=W.. (47) 


Because of (42), (43), and (45), equation (47) gives a relation be- 
tween Ni, N, R, S, and @. Or introducing 

aa =n, (48) 
we may say that equation (47) furnishes a relation between N, », R, S, 
and 6. For fixed values of 6, N, R, and S, this relation gives us the 
value of 7, or, what is the same, the values of N; and N,. But, because 
of (46), this fixes the values of w; and w, , as functions of 6. If every 
individual tries to make his w; = w, a maximum, then dw;/0d0 = 0 gives 
us an equation for the determination of 6. Once @ is determined as a 
function of 7, N, R, and S, we can express 7 as a function of N, R, and 
S. Thus 





y= (N, FR, S). (49) 
Because of (48), equations (43) may be written thus: 
pi —F;(N, 7, R, S) ; Da —F.(N, 7, R,S). (50) 
The total per capita rate of production of any goods is equal to 
Nidi a Nia 
p.c. = v =npit+ (1-9) pa , (51) 


and because of (50), may be expressed through N, », R, S. Data for 











N. RASHEVSKY 133 


N, n, and S are readily available. The determination of RF presents 
greater difficulties. In some cases, very roughly, we may assume R«S. 

The foregoing constitutes quite a program for further mathemat- 
ical research. To illustrate, however, what kind of relations may be 
thus obtained, let us consider the following crude example as an illus- 
tration. Equations (50) give us 


n= Uu(N, R, S, pi). (52) 
It is readily seen that in general 7 increases with p;. Whatever the 


exact relation (52) between 7 and p; , as a first approximation we may 
consider 


Hx pi. (53) 


At the same time, for a given , p; will be the greater, the larger the 
natural resources available per capita. Those will roughly vary in- 








x CALCULATED 
@ OBSERVED 





PC.IN 10°S PER YEAR 











FIGURE 4 
Values of p.c. for different countries. 1—United States, 2—Canada, 3— 
Switzerland, 4—Norway, 5—Germany, 6—Finland, 7—Russia, 8—Japan. 
versely as the population density d. Yet obviously even for d tending 
to zero, p will not exceed a finite value, since it is limited by the ability 
of an individual to produce goods with an infinite supply of raw ma- 
terial. We therefore put 
U/] Re 
eS; 54 
wee re — 
B being a constant. 
If, as is usually the case, p, << p; , then, approximately, we have 
from equations (51): 
p.c. = npi « (55) 
This, combined with (54), gives 


2 


Ay (56) 





p.c. = 


Bt+d— 











134 PSYCHOMETRIKA 


We may expect that p.c. will vary approximately as per capita income 
in the country. To what extent even the crude equation (56) is satisfied 
in some cases is shown in Figure 4. That figure is by no means in- 
tended as corroborating equation (56), but rather to show how rela- 
tions obtained by theoretical considerations may be compared with 
available data. 

Countries having large colonies are intentionally excluded from 
Figure 4. While data on 7 are available for most principal countries, 
none are available for colonies. There are also other complicating 
situations that enter in the case when we treat a problem of a country 
with a multiple-bounded contour. All this should stimulate the theo- 
retical study of the general problem, as outlined above. The exact 
expression of p; in terms of », d, and S, obtained from equations (53) 
and (54) will undoubtedly be much more complicated than equation 
(56), and it is that complicated equation that should be compared with 


actual data. 
The author is indebted to Dr. Alston S. Householder for critical 


comments. 


REFERENCES 


1. Rashevsky, N. Further contributions to the mathematical theory of human 
relations. Psychometrika, 1936, 1, 21-31. 

2. Rashevsky, N. Studies in mathematical theory of human relations. Psycho- 
metrika, 1939, 4, 221-239. 

8. Rashevsky, N. Studies in mathematical theory of human relations II. Psy- 
chometrika, 1939, 4, 283-299. 

4. Rashevsky, N. Contributions to the mathematical theory of human relations 
III. Psychometrika, 1940, 5, 203-210. 

5. Rashevsky, N. Contributions to the mathematical theory of human relations 
IV. Outline of a mathematical theory of individual freedom. Psychometrika, 
1940, 5, 299-303. 

6. Rashevsky, N. Note on the mathematical theory of interaction of social 
classes. Psychometrika, 1941, 6, 43-47. 

7. Rashevsky, N. and Householder, A. S. On the mutual influence of indi- 
viduals in a social group. Psychometrika, 1941, 6, 317-321. 

8. Rashevsky, N. On the variation of the structure of a social group with time. 
Psychometrika, 1941, 6, 273-277. 

9. Evans, Griffith C. Mathematical Introduction to Economics. New York: 
McGraw-Hill, 1930. 

10. Thurstone, L. L. The indifference function. J. soc. Psychol., 1931, 2, 189-167. 

11. Sorokin, P. Social and cultural dynamics. Vol. II. New York: American 
Book Co., 1937. 

12. Ibid. Vol. III. New York: American Book Co., 1937. 

13. Reinhardt, J. M. and Davis, G. R. Principles and methods of sociology. 
New York: Prentice-Hall, Inc., 1932. 

14. Statistisches Jahrbuch fur das Deutsche Reich, 1987. 

















PSYCHOMETRIKA—VOL. 7, NO. 2 
JUNE, 1942 


AN APPRAISAL OF THE VALIDITY OF THE FACTOR LOAD- 
INGS EMPLOYED IN THE CONSTRUCTION OF THE 
PRIMARY SOCIAL ATTITUDE SCALES 


LEONARD W. FERGUSON AND WARREN R. LAWRENCE 
UNIVERSITY OF CONNECTICUT 


In this article the authors examine the effect of including alter- 
nate test forms in a factor matrix upon the validity of the resultant 
factor loadings, finding that in this particular instance the effect is 
negligible. Comparisons of the factor loadings derived from matrices 
in which only one of the alternate test forms is included with those 
in which both forms are included reveal practically no difference in 
the magnitude of either the original or rotated factor loadings, or in 
that of the computed communalities. 

The primary attitude scales prepared by the senior author (2, 3) 
are based upon factorial analyses of correlational matrices which in- 
clude data from both forms of each of the (Thurstone) attitude scales 
employed. There appears to be a question,* however, concerning the 
validity of loadings when two forms of a test are included in a factor 
matrix so we have reanalyzed all the original data, treating those for 
alternate forms of the tests separately. It is the purpose of this re- 
search, therefore, to determine whether the factor loadings previously 
presented (2, 4) are inflated by the inclusion of alternate forms of the 
tests in various factor matrices, and if so, to determine the effects of 
this tendency on the validity of those factor loadings. 

From the original matrix (4) showing the intercorrelations 
among both forms of scales for the measurement of attitudes toward 
God (1), evolution (11), birth control (12), war (7), capital punish- 
ment (6), treatment of criminals (13), law (5), censorship (8), 
patriotism (9) and communism (10), six different factor matrices 
were formed. Three of these include the correlations among various 
combinations of the A forms of each of the scales, whereas the other 
three include the intercorrelations among various combinations of the 
B forms. Two of the matrices (one for the A forms, the other for the 
B forms) include the correlations among scales for the measurement 
of attitudes toward God, evolution, birth control, war, capital punish- 
ment, and the treatment of criminals. These will be recognized (see 4) 
as the scales which define primary social attitudes I (Religionism) and 


* Raised by Dr. Quinn McNemar, Stanford University. 
135 











136 PSYCHOMETRIKA 


II (Humanitarianism) . Two other matrices (one for the A forms, the 
other for the B forms) include the correlations among scales for the 
measurements of attitudes toward God, evolution, birth control, law, 
censorship, patriotism, and communism. These will be recognized (see 
2, 4) as the scales which define primary social attitudes I (Religion- 
ism) and III (Nationalism). The remaining two matrices (one for 
the A forms and the other for the B forms) include the correlations 
among scales for the measurement of attitudes toward war, capital 
punishment, treatment of criminals, law, censorship, patriotism, and 
communism. These of course are the scales (see 2, 4) which define 
primary social attitudes II (Humanitarianism) and III (National- 
ism). The reader will note that these matrices completely reduplicate 
all those previously presented, except of course that alternate forms 
of the scales are treated separately. 

Each of these matrices was analyzed by the centroid method. The 
patterns exhibited by the loadings secured from the separate analyses 
are strikingly similar. The mean difference between the factor load- 
ings determined from the matrices for the A forms and those for the 
B forms is only .06 (range .17 to .00). 27 of the 40 differences are .07 
or less, while only 4 are over .10. Comparisons of the factor loadings 
secured from the analyses of the intercorrelations among the separate 
forms of the tests with those previously obtained (when data from 
both forms of the tests were included in the same matrices) show that 
the average amounts by which the factor loadings tend to be inflated 
by the inclusion of both forms of the attitude scales in the factor 
matrices are: (1) for all the comparisons in which Factor I (Religion- 
ism) is involved, .02; (2) for all the comparisons in which Factor II 
(Humanitarianism) is involved, .03; and (3) for all the comparisons 
in which Factor III (Nationalism) is involved, .03. The average in- 
crease for all 211 comparisons is .03. In only 7 instances are there 
discrepancies of .10 or more. In view of the fact that the standard 
errors of the factor loadings are in the neighborhood of .07 or perhaps 
larger, it seems legitimate to conclude that the loadings previously 
presented (2, 4) are not seriously inflated. 

In the original analyses (in which both forms of all tests were 
included) the communalities on the average are: (1) for the matrix 
in which Factors I and II appeared, .05 points higher, (2) for the 
matrix in which Factors I and III appeared, .04 points higher, and 
(3) for the matrix in which Factors II and III appeared, .03 points 
higher than in the later analyses (when each of the forms were an- 
alyzed in separate matrices). The average amount by which all 40 
communalities tend to be inflated is .04. Certainly the extent of this 











LEONARD W. FERGUSON AND WARREN R. LAWRENCE 137 


inflation cannot be considered serious. 

The amounts by which the rotated factor loadings previously pre- 
sented were inflated are also very small, averaging .05, .03, and .04 for 
the matrices including: (1) Factors I and II; (2) Factors I and III; 
and (3) Factors II and III, respectively. The inflationary tendency is 
so minor that its effects can quite legitimately be overlooked. 


SUMMARY 


We have in this article re-examined the factor loadings presented 
by the senior author in connection with the isolation and measurement 
of some primary social attitudes in order to determine the presence 
or absence of certain inflationary tendencies. We wished to discover 
whether or not the inclusion of both of the alternate forms of the 
same attitude scales in a factor matrix would seriously inflate the 
factor loadings obtained. 

All the matrices previously presented (2, 3, 4) were subdivided 
into those which contained data only from the A forms of each of the 
appropriate scales and into those which contained data only from the 
B forms of the scales. Each of these matrices was analyzed by the 
centroid method and the factor loadings obtained were compared with 
those previously presented. The various combinations of the scales 
studied together with the results of the various analyses indicate quite 
conclusively that the inclusion of both of the alternate forms of the 
attitude scales in the original factor matrices did not operate to in- 
flate to any serious extent the factor loadings obtained. There was a 
slight tendency in this direction but in all cases the effect of the in- 
flation was quite negligible. The writers do not pretend to know, how- 
ever, whether this conclusion can be generalized beyond the specific 
research in question. 


REFERENCES 


1. Chave, E. J. and Thurstone, L. L. Scale for the measurement of attitude 
toward God. Chicago: Univ. Chicago Press, 1931. 

2. Ferguson, L. W. The isolation and measurement of nationalism. J. Soc. 
Psychol. (in press). 

The measurement of primary social attitudes. J. Psychol., 1940, 10, 

199-205. 

Primary social attitudes. J. Psychol., 1939, 8, 217-223. 

5. Katz, D. Scale for the measurement of attitude toward the law. Chicago: 
Univ. Chicago Press, 1931. 

6. Peterson, R. C. Scale for the measurement of attitude toward capital pun- 
ishment. Chicago: Univ. Chicago Press, 1931. 

Seale for the measurement of attitude toward war. Chicago: Univ. 

Chicago Press, 1931. 

















138 


10. 
11. 
12. 


13. 





PSYCHOMETRIKA 


Rosander, A. C. and Thurstone, L. L. Scale for the measurement of attitude 
toward censorship. Chicago: Univ. Chicago Press, 1931. 

Thiele, M. B. and Thurstone, L. L. Scale for the measurement of attitude 
toward patriotism. Chicago: Univ. Chicago Press, 1931. 

Thurstone, L. L. Scale for the measurement of attitude toward communism. 
Chicago: Univ. Chicago Press, 1931. 

Thurstone, T. G. Scale for the measurement of attitude toward evolution. 
Chicago: Univ. Chicago Press, 1931. 

Wang, C. K. A. and Thurstone, L. L. Scale for the measurement of attitude 
toward birth control. Chicago: Univ. Chicago Press, 1930. 

Scale for the measurement of attitude toward the treatment of crimi- 





nals. Chicago: Univ. Chicago Press, 1981. 








——_— sry 





-—_ » <9 

















i 














PSYCHOMETRIKA—VOL. 7, NO. 2 
JUNE, 1942 


RESPONSE RELAY 


J. E. P. LIBBY 


PSYCHOMETRIC LABORATORY, UNIVERSITY OF CHICAGO 


Controlled by microphone, photocell, or keys, the described in- 
strument operates a chronoscope, chronograph, etc., together with a 
projector, bell, light, or other stimulus requiring up to 200 watts. 


This instrument is suitable for starting or stopping any 110-volt 
AC or DC apparatus up to 300 watts in response to sound impulse, 
change in light intensity, or closing of low voltage manual key. Its 
principal use in the Psychometric Laboratory at the University of 
Chicago was in controlling a projector lamp and chronoscope to meas- 
ure verbal response time to multiple-choice material projected from 
microfilm. It is built around a resistance-capacitance coupled audio 
amplifier, operable on standard 110 AC or DC lines and conventional 
in design except for the filter circuit; here the 500 ohm resistor Ri, 
replaces the more usual choke. This filter provides adequate smooth- 
ing for the purpose in hand with a reduction in cost, space, and weight. 
The microphone (M) used is a “Quam Permanic,” selected first be- 
cause it requires no exciting voltage, thus eliminating batteries or 
additional power pack, and second because of its very low cost. It was 
found desirable to extend the tension adjustment (K) of the relay 
outside the case; this permits convenient compensation for “warm- 
up” and line voltage fluctuation. 

If the total load of the controlled devices (e.g., flood-lamp) ex- 
ceeds 300 watts, an additional power relay must be used; the portable 
response timer described by Libby and Hunsicker elswhere in this 
journal* has been found satisfactory in a number of such applications. 

With the plates of the rectifier connected to “X,” the relay acts 
in a manner similar to a latch relay in that the output circuit, once 
broken by a response impulse, remains open until reset by “Sw,” (e.g., 
allowing time for reading a chronoscope). This results from the fact 
that the relay cuts off the amplifier plate supply at the same time it 
breaks the output circuit. Where graphic recording or other set-up 
requiring automatic reset is used, the rectifier plates are connected to 


* Libbev, J. E. P. and Hunsicker, A. L. Portable response timer, Psycho- 
metrika, 1941, 6, 401-403. 


139 











140 PSYCHOMETRIKA 


“Y”; this gives a momentary break in output circuit for each input 
impulse, followed by immediate restoration. 

The sensitivity may be controlled either by the input potentio- 
meter RF, or by the relay tension adjustment and is such that “e” as 
normally spoken by a woman may be reliably recorded. Where room 
noise is excessive, the Quam microphone may be replaced by a Brush 
crystal earphone held against the subject’s throat; this device has been 
found equally sensitive to verbal response and almost unaffécted by 
external sounds. With either microphone it is essential to use a shield- 
ed input cable. 





LEGEND 
M_ Microphone 


Sw, Reset Switch 


K_ Tension Adjustment 
OP Output 

ig Potentiometer 

S Shielded Cable 

L Line Cord 


When manual responses are used, the keys may be connected to 
the same terminals used for the microphone; unprotected keys may be 
used since the voltage across them is trivial. When recording fluctua- 
tions of light intensity, a photocell may be connected to the unground- 
ed end of the potentiometer (1/7) and to the 6J7 plate (P); this sup- 
plies adequate exciting voltage and ‘the polarity is correct when the 
connections are collector-to-plate and emitter-to-potentiometer. 

All parts are standard and available through radio supply houses. 
The total cost for material, including a 6”x6"x5” steel case and micro- 
phone, is less than $15.00. 




















Relay Ebby 1000 ohms 


M 


MW WH 


i mn 
) my 


To 


~ 
- 











Quam Permanic Microphone 
Shielded Cable 
0.5 megohm Potentiometer 


3000 ohms 
25000 ohms 
1 megohm 


0.1 megohm 
0.8 megohm 
150 ohms 
500 ohms 


J. E. P. LIBBY 


WIRING DIAGRAM FOR RESPONSE RELAY 











Relay x 
\ Sy, 
2546 id 








110 
ac-DC 




















-- Ground 


R,, 200 ohms in line cord 
C, 5 mfd. 

C, 0.1 mfd. 

C; 0.4 mfd. 

C, 5 mfd. 

C,, 20 mfd. electrolytic 150 v. work. 
C,. 20 mfd. electrolytic 150 v. work. 
Sw, Hand Switch (reset) 

Sw, Line Switch attached to Pot. 
Sw, SPDT Toggle Switch 





qndyno 

















PSYCHOMETRIKA—VOL. 7, NO. 2 
JUNE, 1942 


A NOTE ON THE COMPUTATION OF BISERIAL r 
IN ITEM VALIDATION 


PHILIP H. DUBOIS 
UNIVERSITY OF NEW MEXICO 


A method of computing biserial coefficients of correlation through 
the use of punch card tabulating equipment is presented. Each item 
is assigned a separate column and successes are punched 1. By ar- 
ranging the cards on the criterion variable and obtaining progressive 
sums on several columns simultaneously, it is possible to obtain data 
for several correlations in one run of the cards through the machine. 


When biserial coefficients of correlation are to be computed be- 
tween a large number of dichotomous variables and a single series of 
graduated scores as in item validation, the use of punch card equip- 
ment speeds the work enormously. One punch card method, which re- 
quires, in addition to the machine steps necessary to find the mean and 
the standard deviation of the criterion variable, a sort of all the cards 
and tabulation of part of them for each biserial 7, has been described 
by Royer.* 

The method to be described here is several times faster in that 
only one complete sort is necessary and the data for several biserial 
r’s can be found simultaneously, the number being limited by the 
capacity of the tabulator. Using the standard 80-column card, separate 
columns are assigned to each of the items. Successes are punched 1 
and failures are passed over or punched 0. The criterion variable is 
punched in appropriate columns. With a 2-digit criterion variable, 78 
dichotomous variables or items can be punched on a single card. If 
there are more than 78 items, another card or other cards will be 
necessary for each subject, with the criterion score punched in each. 
The mean and standard deviation for the criterion variable are ob- 
tained by the usual progressive digiting method as described by Royer. 
In so doing, the cards will be arranged in order from the highest to 
the lowest criterion scores. 

If there are gaps in the series, they must be filled with cards 
punched only with the criterion variable. 

The tabulator is then adjusted to control on the units digit of the 


_  * Rover, Elmer B., “A Machine Method for Computing the Biserial Correla- 
tion Coefficient in Item Validation,’”’ Psychometrika, 1941, 6, 55-59. 


143 











144 PSYCHOMETRIKA 


criterion variable and to obtain cumulative totals on several columns 
carrying the items. A summary punch is attached to the tabulator so 
that the set of progressive totals obtained with each change in the 
criterion variable is transferred to summary cards. If a numerical 
tabulator with five adding blanks is used, 10 items can be handled 
simultaneously if N is large, and 20 if no more than 99 subjects have 
passed any item. A large alphabetic tabulator, if equipped with the 
progressive totalling feature, has even greater capacity. 

Ordinarily no criterion score will be 0. If there are such criterion 
scores, the corresponding cards should be eliminated immediately 
after finding the mean and the standard deviation. If the lowest cri- 
terion score is in the neighborhood of 1, blank cards punched only 
with the continuation of the criterion variable down through 1 may 
be introduced into the series, and Royer’s formula used without 
change. Otherwise K should be computed by subtracting 1 from the 
lowest criterion score. The Pearson formula for biserial 7 then 
becomes 
=P’ — N,(M; — K) 

zNot : 
in which SP’ is the sum of the progressive totals on each dichotomous 
variable or item, N, is the number of individuals passing that item, 
M, is the mean of all the criterion scores, o; is their standard deviation, 
and z is the ordinate of the normal distribution curve at the point of 
dichotomy as found in the Kelley-Wood tables. 

After the summary cards are made for each group of items, the 
tabulator is adjusted to add and SP’, that is, the sum of the cumulative 
totals on each item, obtained by adding the numbers punched on the 
summary cards. The number of summary cards in each set will be 
equal to the highest criterion score plus 1 minus the lowest. Several 
items can be handled simultaneously, but generally not as many as in 
the original tabulation. Here again, the speed of the work is limited 
only by the capacity of the machine. 

N,, the number passing each item, is found from the dials or the 
tape at the end of the first tabulation. It is the last progressive total 
for each item. 

In the use of the formula, only =P’, N,, and z will vary from item 
to item. Before finding z, N, will of course be divided by N. 





‘bis = 


Arithmetical Explanation 


Since successes are punched 1 and failures are punched 0 or 
passed over in the column assigned to that item, summing on that 

















PHILIP H. DUBOIS 145 


column will give N,, the number of passes, and the last progressive 
total is that sum. 

If the range of the criterion variable, defined as the highest score 
plus 1 less the lowest score, is denoted as n, then the 1 for each card 
carrying the highest criterion score will be represented in the set of 
progressive totals for that item » times; the 1 on the cards carrying 
the next highest score, n — 1 times, etc.; and the grand total of the 
cumulative totals will therefore be the sum of the criterion scores less 
KN,. Our formula takes advantage of the fact that SP = SP’ + KN,. 

A simple numerical example is given below, showing the work 
for five items. To the left of the criterion variable, each line below the 
item designations represents a data card. To the right of the criterion 
variable, each line represents a summary card carrying progressive 
totals. 


Data Cards Criterion 
Variable Progressive Totals 
Items: 12 $3 4 6 lt: 1 2 8 £6 
, OF 2-2. 9 22 
110041 22 SP ai ee 
(Blank Card) 21 a | 1 5 (| 
0111 é=+(0 20 yh mate’ “lms | 
o 6 Oo £ fT 19 
0 @-3-2 4 19 
LO 4-2-9 19 o 2 dp 88 
i-d 0 1 @ 18 42 4 6 8 





oF 26° 8.32 46) 9 


The lowest criterion score is 18 and therefore K is 17. For Item 1, N, 
is 4and SP’ is 13. If P were desired, it would be 13 plus 4 times 17, 
or 81. Since, however, the formula has been adjusted to work with 
SP’ directly, it is never necessary to compute X;. 

The method can be readily used with the point biserial coefficient 
of Richardson and Stalnaker,* which does not assume normality in 
the distribution of the dichotomous variable. The formula for this 
coefficient may be written 


>P’ — N,(M; — K) 
o:V N,N, 


‘ois = 





in which N, is the number of failures found by subtracting N, from N. 


* Richardson, M. W., and Stalnaker, J. N., “A Note on the Use of Biserial 
R in Test Research,” Journal of General Psychology, 1933, 8, 463-65. 








PSYCHOMETRIKA 


Checks on Computation 

If the criterion variable is not the sum of the items, the chief 
check on the accuracy of the machine work is finding the N,’s by 
simple addition on the columns assigned to the items. 

If the criterion variable is the sum of the correct items, then 
additional checks become possible. In computing M; and o; , both JY; 
and J Y;2 have been found by progressive digiting. Then the sum of all 
the N,’s equals the sum of the criterion variable, that is, >N,= ZY; , 
and the total of all the sums of the criterion scores for those passing 
each item equals the sum of the squares of the criterion variable, that 
is, >>P = SY;?. With the method described, however, the >P’’s are 
found instead of the SP’s so that the check becomes SP’ = JY;? — 
KY. 

Alternative Method 

If no summary punch is available and the progressive totals must 
be summed from the tape, an alternative method involving =P is 
recommended, as adding machine work is reduced. This method is 
essentially the same as that used in obtaining XY in correlation 
problems. 

The first step is to sort on the units digit of the criterion variable. 
The tabulator is adjusted to control on the units digit and to obtain 
progressive totals on several items. The 9’s are fed into the tabulator 
first, followed by the 8’s and the other digits in order. Then the cards 
are sorted on the tens digit and progressive totals obtained on the 
same item again. If there is a hundreds digit, the process is repeated 
once more. In each case a gap in the series is filled with a blank card 
to obtain a progressive total for that gap. 

For each run the last progressive total on each item is N,. This 
is the progressive total for the 0-cards and is not added in to obtain 
the sum of the progressive totals. 

From the tape the progressive totals for each run and each item 
are summed and for any item SP is equal to the sum of the progressive 
totals down through 1 on the units digits,-plus 10 times the sum of 
the progressive totals down through 1 on the tens digit, plus 100 times 
the sum of the progressive totals down through 1 on the hundreds 
digit. The formulas then are the usual formulas for biserial r: 
=P —N,M, 

2No; , 


Tois — 


and, for the point biserial: 


=P —N,M; 
oVN,Ny 


Yois = 











